Usage-Based Billing
The most common billing method for text models, typically based on input and output Token usage
Usage-based billing is the most common billing method for text models.
Simply put:
- The content you send to the model consumes some Tokens
- The content returned by the model also consumes some Tokens
- The system calculates charges based on the actual usage
Why most text models use usage-based billing
Because text request lengths can vary significantly.
For example:
- If you send a single phrase like "Hello," it consumes very little
- If you send a large amount of context, long prompts, or long documents and ask the model to generate a long response, it will consume more
So usage-based billing is generally fairer and more granular.
What matters most in usage-based billing
For beginners, what really matters is not memorizing the billing formula, but understanding this:
In general, the longer the input, the more context included, and the longer the output, the higher the cost.
This is also why many people initially think, "I only asked one question, so why isn't the cost low?" — because the model may not be seeing only the final sentence. It may also include:
- Conversation history
- System prompts
- Additional context
- Tool call-related content
Common factors that affect cost
1. Input length
The longer the prompt and the more attached material included, the higher the input Token usage.
2. Output length
The longer the model's response, the higher the output Token usage.
3. Historical context
In multi-turn conversations, the client may send earlier chat history along with the current request.
4. The model itself
Different models have different unit prices. Even with similar Token usage, the cost may vary.
5. Group strategy
The same model may have different pricing strategies under different Groups.
How to optimize usage-based billing costs
If you want to save money, the most effective approach is usually not to "use the model less," but to "reduce unnecessary consumption."
Recommended practices
- Simplify prompts and avoid repeating background information
- Control the length of historical messages
- Avoid having the model generate excessively long responses without a clear reason
- Match different use cases with models at different pricing tiers
- Use different Keys and Groups to separate test traffic from production traffic
Common misconceptions
- Assuming only the final question is counted
- Not realizing that the client may be quietly sending a large amount of historical context
- Repeatedly calling expensive models for testing purposes
One-line summary
The core of usage-based billing is not "how much a single request costs," but "how much total input and output this request consumed."
When should I look at per-request billing?
If you mainly use image, video, or fixed-action APIs, you can also see:
How is this guide?
Last updated on