Token Calculation Guide

Token calculation determines the computational cost of generation requests by converting inputs (image dimensions, text prompts, steps, etc.) into a standardized token count, which is then used to calculate the final cost.

Supported Types

Type	Description
`T2I`	Text-to-Image
`T2V`	Text-to-Video
`TI2V`	Image-to-Video (with image conditioning)
`FineTuning`	Model fine-tuning

T2I (Text-to-Image)

Inputs

Field	Type	Description
`words_positive`	u32	Word count of positive prompt
`words_negative`	u32	Word count of negative prompt
`width`	u32	Image width in pixels
`height`	u32	Image height in pixels
`steps`	u32	Number of inference steps
`lora_count`	u32	Number of active LoRAs
`double_pass`	bool	True if CFG > 1 or negative prompt present

Hyperparameters (defaults)

Field	Default	Description
`mu_txt`	1.3	Word-to-token multiplier
`gamma`	1.0	Text cost weight

Formula

T_img = ⌈Width/16⌉ × ⌈Height/16⌉
T_pos = words_positive × μ_txt
T_neg = words_negative × μ_txt

Pass1 = T_img + γ × T_pos
Pass2 = T_img + γ × T_neg (only if double_pass)

Total Tokens = n × steps × (Pass1 + δ × Pass2)

Usage

use heyotokencounter::{T2I, TokenCalculation};

let t2i = T2I::new(
    10,    // words_positive
    5,     // words_negative
    512,   // width
    512,   // height
    20,    // steps
    1,     // lora_count
    true,  // double_pass
);

let result = t2i.calculate(100.0)?; // cost per million tokens
println!("Tokens: {}, Cost: ${:.4}", result.total_tokens, result.total_cost);

T2V (Text-to-Video)

Inputs

Field	Type	Description
`words_positive`	u32	Word count of positive prompt
`words_negative`	u32	Word count of negative prompt
`width`	u32	Video width in pixels
`height`	u32	Video height in pixels
`frames`	u32	Number of frames to generate
`steps`	u32	Number of inference steps
`lora_count`	u32	Number of active LoRAs
`double_pass`	bool	True for CFG-based models

Hyperparameters (defaults)

Field	Default	Description
`mu_txt`	1.3	Word-to-token multiplier
`gamma`	1.0	Text cost weight
`mu_time`	4	Temporal compression factor
`mu_space`	16	Spatial compression factor

Formula

T_vid = ⌈Frames/μ_time⌉ × ⌈Width/μ_space⌉ × ⌈Height/μ_space⌉
T_pos = words_positive × μ_txt
T_neg = words_negative × μ_txt

Pass1 = T_vid + γ × T_pos
Pass2 = T_vid + γ × T_neg

Total Tokens = n × steps × (Pass1 + δ × Pass2)

Usage

use heyotokencounter::{T2V, TokenCalculation};

let t2v = T2V::new(
    10,    // words_positive
    0,     // words_negative
    512,   // width
    512,   // height
    81,    // frames
    30,    // steps
    1,     // lora_count
    true,  // double_pass
);

let result = t2v.calculate(100.0)?;

TI2V (Image-to-Video)

Same as T2V but includes image conditioning tokens.

Additional Hyperparameters

Field	Default	Description
`beta`	1.0	Image conditioning multiplier

Formula

T_vid = ⌈Frames/μ_time⌉ × ⌈Width/μ_space⌉ × ⌈Height/μ_space⌉
T_img_cond = ⌈Width/μ_space⌉ × ⌈Height/μ_space⌉

Pass1 = T_vid + β × T_img_cond + γ × T_pos
Pass2 = T_vid + β × T_img_cond + γ × T_neg

Total Tokens = n × steps × (Pass1 + δ × Pass2)

Usage

use heyotokencounter::{TI2V, TokenCalculation};

let ti2v = TI2V::new(
    10,    // words_positive
    0,     // words_negative
    512,   // width
    512,   // height
    81,    // frames
    30,    // steps
    1,     // lora_count
    true,  // double_pass
);

let result = ti2v.calculate(100.0)?;

FineTuning

Inputs

Field	Type	Description
`training_data`	Vec<TrainingImage>	Training images with annotations
`samples`	Vec<SamplePrompt>	Validation prompts
`total_steps`	u32	Total training steps
`sample_every`	u32	Sample every K steps
`inference_steps`	u32	Inference steps for sampling

Hyperparameters (defaults)

Field	Default	Description
`mu_txt`	1.3	Word-to-token multiplier
`gamma`	1.0	Text cost weight

TrainingImage

pub struct TrainingImage {
    pub width: u32,
    pub height: u32,
    pub annotation_words: u32,
}

SamplePrompt

pub struct SamplePrompt {
    pub width: u32,
    pub height: u32,
    pub words: u32,
}

Formula

// Per training image
T_img = ⌈Width/16⌉ × ⌈Height/16⌉
T_txt = annotation_words × μ_txt
Load = T_img + γ × T_txt

// Average load
Load_avg = (1/N) × Σ Load_i

// Sampling load
Load_samples = Σ (T_smpl_img + γ × T_smpl_txt)

// Total
Training Tokens = total_steps × Load_avg
Sampling Tokens = ⌊total_steps/sample_every⌋ × inference_steps × Load_samples
Total Tokens = Training Tokens + Sampling Tokens

Usage

use heyotokencounter::{FineTuning, TrainingImage, SamplePrompt, TokenCalculation};

let ft = FineTuning::new(
    vec![
        TrainingImage { width: 512, height: 512, annotation_words: 20 },
        TrainingImage { width: 768, height: 768, annotation_words: 15 },
    ],
    vec![
        SamplePrompt { width: 512, height: 512, words: 10 },
    ],
    1000,  // total_steps
    100,   // sample_every
    20,    // inference_steps
);

let result = ft.calculate(100.0)?;

Error Handling

The calculate() method returns Result<TokenResult, TokenError>.

pub enum TokenError {
    InvalidDimensions { width: u32, height: u32 },
    InvalidSteps(u32),
    InvalidFrames(u32),
    EmptyTrainingData,
}

Example

match t2i.calculate(100.0) {
    Ok(result) => println!("Tokens: {}", result.total_tokens),
    Err(TokenError::InvalidDimensions { width, height }) => {
        eprintln!("Invalid dimensions: {}x{}", width, height);
    }
    Err(TokenError::InvalidSteps(s)) => {
        eprintln!("Invalid steps: {}", s);
    }
    Err(TokenError::InvalidFrames(f)) => {
        eprintln!("Invalid frames: {}", f);
    }
    Err(TokenError::EmptyTrainingData) => {
        eprintln!("Empty training data");
    }
}

Customizing Hyperparameters

let mut t2i = T2I::new(10, 5, 512, 512, 20, 1, true);
t2i.mu_txt = 2.0;  // custom word-to-token multiplier
t2i.gamma = 0.5;   // custom text weight

let result = t2i.calculate(100.0)?;

Hyperparameter Summary

Parameter	Default	Used In	Description
`mu_txt`	1.3	All	Word-to-token multiplier
`gamma`	1.0	All	Text cost weight
`mu_time`	4	T2V, TI2V	Temporal compression
`mu_space`	16	T2V, TI2V	Spatial compression
`beta`	1.0	TI2V	Image conditioning weight

Supported Types​

T2I (Text-to-Image)​

Inputs​

Hyperparameters (defaults)​

Formula​

Usage​

T2V (Text-to-Video)​

Inputs​

Hyperparameters (defaults)​

Formula​

Usage​

TI2V (Image-to-Video)​

Additional Hyperparameters​

Formula​

Usage​

FineTuning​

Inputs​

Hyperparameters (defaults)​

TrainingImage​

SamplePrompt​

Formula​

Usage​

Error Handling​

Example​

Customizing Hyperparameters​

Hyperparameter Summary​

Supported Types

T2I (Text-to-Image)

Inputs

Hyperparameters (defaults)

Formula

Usage

T2V (Text-to-Video)

Inputs

Hyperparameters (defaults)

Formula

Usage

TI2V (Image-to-Video)

Additional Hyperparameters

Formula

Usage

FineTuning

Inputs

Hyperparameters (defaults)

TrainingImage

SamplePrompt

Formula

Usage

Error Handling

Example

Customizing Hyperparameters

Hyperparameter Summary