Skip to main content

Token Calculation Guide

Token calculation determines the computational cost of generation requests by converting inputs (image dimensions, text prompts, steps, etc.) into a standardized token count, which is then used to calculate the final cost.

Supported Types

TypeDescription
T2IText-to-Image
T2VText-to-Video
TI2VImage-to-Video (with image conditioning)
FineTuningModel fine-tuning

T2I (Text-to-Image)

Inputs

FieldTypeDescription
words_positiveu32Word count of positive prompt
words_negativeu32Word count of negative prompt
widthu32Image width in pixels
heightu32Image height in pixels
stepsu32Number of inference steps
lora_countu32Number of active LoRAs
double_passboolTrue if CFG > 1 or negative prompt present

Hyperparameters (defaults)

FieldDefaultDescription
mu_txt1.3Word-to-token multiplier
gamma1.0Text cost weight

Formula

T_img = ⌈Width/16⌉ × ⌈Height/16⌉
T_pos = words_positive × μ_txt
T_neg = words_negative × μ_txt

Pass1 = T_img + γ × T_pos
Pass2 = T_img + γ × T_neg (only if double_pass)

Total Tokens = n × steps × (Pass1 + δ × Pass2)

Usage

use heyotokencounter::{T2I, TokenCalculation};

let t2i = T2I::new(
10, // words_positive
5, // words_negative
512, // width
512, // height
20, // steps
1, // lora_count
true, // double_pass
);

let result = t2i.calculate(100.0)?; // cost per million tokens
println!("Tokens: {}, Cost: ${:.4}", result.total_tokens, result.total_cost);

T2V (Text-to-Video)

Inputs

FieldTypeDescription
words_positiveu32Word count of positive prompt
words_negativeu32Word count of negative prompt
widthu32Video width in pixels
heightu32Video height in pixels
framesu32Number of frames to generate
stepsu32Number of inference steps
lora_countu32Number of active LoRAs
double_passboolTrue for CFG-based models

Hyperparameters (defaults)

FieldDefaultDescription
mu_txt1.3Word-to-token multiplier
gamma1.0Text cost weight
mu_time4Temporal compression factor
mu_space16Spatial compression factor

Formula

T_vid = ⌈Frames/μ_time⌉ × ⌈Width/μ_space⌉ × ⌈Height/μ_space⌉
T_pos = words_positive × μ_txt
T_neg = words_negative × μ_txt

Pass1 = T_vid + γ × T_pos
Pass2 = T_vid + γ × T_neg

Total Tokens = n × steps × (Pass1 + δ × Pass2)

Usage

use heyotokencounter::{T2V, TokenCalculation};

let t2v = T2V::new(
10, // words_positive
0, // words_negative
512, // width
512, // height
81, // frames
30, // steps
1, // lora_count
true, // double_pass
);

let result = t2v.calculate(100.0)?;

TI2V (Image-to-Video)

Same as T2V but includes image conditioning tokens.

Additional Hyperparameters

FieldDefaultDescription
beta1.0Image conditioning multiplier

Formula

T_vid = ⌈Frames/μ_time⌉ × ⌈Width/μ_space⌉ × ⌈Height/μ_space⌉
T_img_cond = ⌈Width/μ_space⌉ × ⌈Height/μ_space⌉

Pass1 = T_vid + β × T_img_cond + γ × T_pos
Pass2 = T_vid + β × T_img_cond + γ × T_neg

Total Tokens = n × steps × (Pass1 + δ × Pass2)

Usage

use heyotokencounter::{TI2V, TokenCalculation};

let ti2v = TI2V::new(
10, // words_positive
0, // words_negative
512, // width
512, // height
81, // frames
30, // steps
1, // lora_count
true, // double_pass
);

let result = ti2v.calculate(100.0)?;

FineTuning

Inputs

FieldTypeDescription
training_dataVec<TrainingImage>Training images with annotations
samplesVec<SamplePrompt>Validation prompts
total_stepsu32Total training steps
sample_everyu32Sample every K steps
inference_stepsu32Inference steps for sampling

Hyperparameters (defaults)

FieldDefaultDescription
mu_txt1.3Word-to-token multiplier
gamma1.0Text cost weight

TrainingImage

pub struct TrainingImage {
pub width: u32,
pub height: u32,
pub annotation_words: u32,
}

SamplePrompt

pub struct SamplePrompt {
pub width: u32,
pub height: u32,
pub words: u32,
}

Formula

// Per training image
T_img = ⌈Width/16⌉ × ⌈Height/16⌉
T_txt = annotation_words × μ_txt
Load = T_img + γ × T_txt

// Average load
Load_avg = (1/N) × Σ Load_i

// Sampling load
Load_samples = Σ (T_smpl_img + γ × T_smpl_txt)

// Total
Training Tokens = total_steps × Load_avg
Sampling Tokens = ⌊total_steps/sample_every⌋ × inference_steps × Load_samples
Total Tokens = Training Tokens + Sampling Tokens

Usage

use heyotokencounter::{FineTuning, TrainingImage, SamplePrompt, TokenCalculation};

let ft = FineTuning::new(
vec![
TrainingImage { width: 512, height: 512, annotation_words: 20 },
TrainingImage { width: 768, height: 768, annotation_words: 15 },
],
vec![
SamplePrompt { width: 512, height: 512, words: 10 },
],
1000, // total_steps
100, // sample_every
20, // inference_steps
);

let result = ft.calculate(100.0)?;

Error Handling

The calculate() method returns Result<TokenResult, TokenError>.

pub enum TokenError {
InvalidDimensions { width: u32, height: u32 },
InvalidSteps(u32),
InvalidFrames(u32),
EmptyTrainingData,
}

Example

match t2i.calculate(100.0) {
Ok(result) => println!("Tokens: {}", result.total_tokens),
Err(TokenError::InvalidDimensions { width, height }) => {
eprintln!("Invalid dimensions: {}x{}", width, height);
}
Err(TokenError::InvalidSteps(s)) => {
eprintln!("Invalid steps: {}", s);
}
Err(TokenError::InvalidFrames(f)) => {
eprintln!("Invalid frames: {}", f);
}
Err(TokenError::EmptyTrainingData) => {
eprintln!("Empty training data");
}
}

Customizing Hyperparameters

let mut t2i = T2I::new(10, 5, 512, 512, 20, 1, true);
t2i.mu_txt = 2.0; // custom word-to-token multiplier
t2i.gamma = 0.5; // custom text weight

let result = t2i.calculate(100.0)?;

Hyperparameter Summary

ParameterDefaultUsed InDescription
mu_txt1.3AllWord-to-token multiplier
gamma1.0AllText cost weight
mu_time4T2V, TI2VTemporal compression
mu_space16T2V, TI2VSpatial compression
beta1.0TI2VImage conditioning weight