Usage-Based Billing

The most common billing method for text models, typically based on input and output Token usage

Usage-based billing is the most common billing method for text models.

Simply put:

The content you send to the model consumes some Tokens
The content returned by the model also consumes some Tokens
The system calculates charges based on the actual usage

Why most text models use usage-based billing

Because text request lengths can vary significantly.

For example:

If you send a single phrase like "Hello," it consumes very little
If you send a large amount of context, long prompts, or long documents and ask the model to generate a long response, it will consume more

So usage-based billing is generally fairer and more granular.

What matters most in usage-based billing

For beginners, what really matters is not memorizing the billing formula, but understanding this:

In general, the longer the input, the more context included, and the longer the output, the higher the cost.

This is also why many people initially think, "I only asked one question, so why isn't the cost low?" — because the model may not be seeing only the final sentence. It may also include:

Conversation history
System prompts
Additional context
Tool call-related content

Common factors that affect cost

1. Input length

The longer the prompt and the more attached material included, the higher the input Token usage.

2. Output length

The longer the model's response, the higher the output Token usage.

3. Historical context

In multi-turn conversations, the client may send earlier chat history along with the current request.

4. The model itself

Different models have different unit prices. Even with similar Token usage, the cost may vary.

5. Group strategy

The same model may have different pricing strategies under different Groups.

How to optimize usage-based billing costs

If you want to save money, the most effective approach is usually not to "use the model less," but to "reduce unnecessary consumption."

Recommended practices

Simplify prompts and avoid repeating background information
Control the length of historical messages
Avoid having the model generate excessively long responses without a clear reason
Match different use cases with models at different pricing tiers
Use different Keys and Groups to separate test traffic from production traffic

Common misconceptions

Assuming only the final question is counted
Not realizing that the client may be quietly sending a large amount of historical context
Repeatedly calling expensive models for testing purposes

One-line summary

The core of usage-based billing is not "how much a single request costs," but "how much total input and output this request consumed."

When should I look at per-request billing?

If you mainly use image, video, or fixed-action APIs, you can also see:

Per-Request Billing

On this page