Token Calculation Guide
Token calculation determines the computational cost of generation requests by converting inputs (image dimensions, text prompts, steps, etc.) into a standardized token count, which is then used to calculate the final cost.
Supported Types
| Type | Description |
|---|---|
T2I | Text-to-Image |
T2V | Text-to-Video |
TI2V | Image-to-Video (with image conditioning) |
FineTuning | Model fine-tuning |
T2I (Text-to-Image)
Inputs
| Field | Type | Description |
|---|---|---|
words_positive | u32 | Word count of positive prompt |
words_negative | u32 | Word count of negative prompt |
width | u32 | Image width in pixels |
height | u32 | Image height in pixels |
steps | u32 | Number of inference steps |
lora_count | u32 | Number of active LoRAs |
double_pass | bool | True if CFG > 1 or negative prompt present |
Hyperparameters (defaults)
| Field | Default | Description |
|---|---|---|
mu_txt | 1.3 | Word-to-token multiplier |
gamma | 1.0 | Text cost weight |
Formula
T_img = ⌈Width/16⌉ × ⌈Height/16⌉
T_pos = words_positive × μ_txt
T_neg = words_negative × μ_txt
Pass1 = T_img + γ × T_pos
Pass2 = T_img + γ × T_neg (only if double_pass)
Total Tokens = n × steps × (Pass1 + δ × Pass2)
Usage
use heyotokencounter::{T2I, TokenCalculation};
let t2i = T2I::new(
10, // words_positive
5, // words_negative
512, // width
512, // height
20, // steps
1, // lora_count
true, // double_pass
);
let result = t2i.calculate(100.0)?; // cost per million tokens
println!("Tokens: {}, Cost: ${:.4}", result.total_tokens, result.total_cost);
T2V (Text-to-Video)
Inputs
| Field | Type | Description |
|---|---|---|
words_positive | u32 | Word count of positive prompt |
words_negative | u32 | Word count of negative prompt |
width | u32 | Video width in pixels |
height | u32 | Video height in pixels |
frames | u32 | Number of frames to generate |
steps | u32 | Number of inference steps |
lora_count | u32 | Number of active LoRAs |
double_pass | bool | True for CFG-based models |
Hyperparameters (defaults)
| Field | Default | Description |
|---|---|---|
mu_txt | 1.3 | Word-to-token multiplier |
gamma | 1.0 | Text cost weight |
mu_time | 4 | Temporal compression factor |
mu_space | 16 | Spatial compression factor |
Formula
T_vid = ⌈Frames/μ_time⌉ × ⌈Width/μ_space⌉ × ⌈Height/μ_space⌉
T_pos = words_positive × μ_txt
T_neg = words_negative × μ_txt
Pass1 = T_vid + γ × T_pos
Pass2 = T_vid + γ × T_neg
Total Tokens = n × steps × (Pass1 + δ × Pass2)
Usage
use heyotokencounter::{T2V, TokenCalculation};
let t2v = T2V::new(
10, // words_positive
0, // words_negative
512, // width
512, // height
81, // frames
30, // steps
1, // lora_count
true, // double_pass
);
let result = t2v.calculate(100.0)?;
TI2V (Image-to-Video)
Same as T2V but includes image conditioning tokens.
Additional Hyperparameters
| Field | Default | Description |
|---|---|---|
beta | 1.0 | Image conditioning multiplier |
Formula
T_vid = ⌈Frames/μ_time⌉ × ⌈Width/μ_space⌉ × ⌈Height/μ_space⌉
T_img_cond = ⌈Width/μ_space⌉ × ⌈Height/μ_space⌉
Pass1 = T_vid + β × T_img_cond + γ × T_pos
Pass2 = T_vid + β × T_img_cond + γ × T_neg
Total Tokens = n × steps × (Pass1 + δ × Pass2)
Usage
use heyotokencounter::{TI2V, TokenCalculation};
let ti2v = TI2V::new(
10, // words_positive
0, // words_negative
512, // width
512, // height
81, // frames
30, // steps
1, // lora_count
true, // double_pass
);
let result = ti2v.calculate(100.0)?;
FineTuning
Inputs
| Field | Type | Description |
|---|---|---|
training_data | Vec<TrainingImage> | Training images with annotations |
samples | Vec<SamplePrompt> | Validation prompts |
total_steps | u32 | Total training steps |
sample_every | u32 | Sample every K steps |
inference_steps | u32 | Inference steps for sampling |
Hyperparameters (defaults)
| Field | Default | Description |
|---|---|---|
mu_txt | 1.3 | Word-to-token multiplier |
gamma | 1.0 | Text cost weight |
TrainingImage
pub struct TrainingImage {
pub width: u32,
pub height: u32,
pub annotation_words: u32,
}
SamplePrompt
pub struct SamplePrompt {
pub width: u32,
pub height: u32,
pub words: u32,
}
Formula
// Per training image
T_img = ⌈Width/16⌉ × ⌈Height/16⌉
T_txt = annotation_words × μ_txt
Load = T_img + γ × T_txt
// Average load
Load_avg = (1/N) × Σ Load_i
// Sampling load
Load_samples = Σ (T_smpl_img + γ × T_smpl_txt)
// Total
Training Tokens = total_steps × Load_avg
Sampling Tokens = ⌊total_steps/sample_every⌋ × inference_steps × Load_samples
Total Tokens = Training Tokens + Sampling Tokens
Usage
use heyotokencounter::{FineTuning, TrainingImage, SamplePrompt, TokenCalculation};
let ft = FineTuning::new(
vec![
TrainingImage { width: 512, height: 512, annotation_words: 20 },
TrainingImage { width: 768, height: 768, annotation_words: 15 },
],
vec![
SamplePrompt { width: 512, height: 512, words: 10 },
],
1000, // total_steps
100, // sample_every
20, // inference_steps
);
let result = ft.calculate(100.0)?;
Error Handling
The calculate() method returns Result<TokenResult, TokenError>.
pub enum TokenError {
InvalidDimensions { width: u32, height: u32 },
InvalidSteps(u32),
InvalidFrames(u32),
EmptyTrainingData,
}
Example
match t2i.calculate(100.0) {
Ok(result) => println!("Tokens: {}", result.total_tokens),
Err(TokenError::InvalidDimensions { width, height }) => {
eprintln!("Invalid dimensions: {}x{}", width, height);
}
Err(TokenError::InvalidSteps(s)) => {
eprintln!("Invalid steps: {}", s);
}
Err(TokenError::InvalidFrames(f)) => {
eprintln!("Invalid frames: {}", f);
}
Err(TokenError::EmptyTrainingData) => {
eprintln!("Empty training data");
}
}
Customizing Hyperparameters
let mut t2i = T2I::new(10, 5, 512, 512, 20, 1, true);
t2i.mu_txt = 2.0; // custom word-to-token multiplier
t2i.gamma = 0.5; // custom text weight
let result = t2i.calculate(100.0)?;
Hyperparameter Summary
| Parameter | Default | Used In | Description |
|---|---|---|---|
mu_txt | 1.3 | All | Word-to-token multiplier |
gamma | 1.0 | All | Text cost weight |
mu_time | 4 | T2V, TI2V | Temporal compression |
mu_space | 16 | T2V, TI2V | Spatial compression |
beta | 1.0 | TI2V | Image conditioning weight |