AI Agents Just Got Cheaper: Pay Only for What You Use with Conversation AI

Photo by Sanket Mishra on Unsplash
We are moving from a flat per-message fee to a token-based billing model for conversation AI agents. That change gives us a fairer, more transparent way to pay for AI interactions. Instead of a one-size-fits-all charge, we only pay for the actual work an agent does. The result is closer alignment between cost and value, clearer visibility into where spend happens, and opportunities to reduce expenses by simplifying interactions.
Why this change matters
Flat fees hide variability. A short confirmation message and a multi-step customer training session cost the same under a flat rate, even though one uses far fewer compute resources. Token-based pricing charges based on what the AI actually processes — the prompt it reads, the knowledge it looks up, and the answer it generates.
For busy small teams and growing businesses, this brings practical benefits:
- Lower costs for short interactions. Many routine responses are just a few words. Those will become cheaper.
- Predictable budgeting for heavy use. Long, complex conversations will show their true cost so we can plan accordingly.
- Actionable visibility. We can see which features or flows drive token consumption and optimize them.
What is changing — the basics
The billing model is shifting from a flat $0.02 per message to a token usage model. Under the new system, charges are calculated from the number of tokens consumed during an interaction. Tokens represent chunks of text the model reads and generates. A short reply might use a handful of tokens while a long form output or a lookup through a knowledge base consumes many more.
Key protections during the rollout:
- Cost cap through December 31, 2025. No single outbound message will be charged more than $0.02 during this period, even if token usage would normally exceed that amount.
- Flow-based builder outbound messages will have outbound message charges waived until December 31, 2025.
- Comparison period for admins. Admins can compare token-based pricing against the current model for a transition period that ends October 31, 2025. After that date token pricing becomes the default.
How token-based pricing works in practice
Every interaction with an AI agent involves input tokens (what the agent reads) and output tokens (what it writes). Charges are calculated from the total tokens used. The token count depends on:
- Message length and complexity
- Number of times the agent rereads instructions or prompts
- Lookups into knowledge bases or attached files
- Custom variables, actions, and any voice or image processing
Example scenarios:
- Short confirmation. A reply like "We are open from 10 am to 5 pm" may cost only a fraction of a cent — often much less than the old $0.02 flat rate.
- Multi-step workflow. An agent trained to run a multi-step process, reference uploaded PDFs and knowledge bases, and produce a detailed report will consume many more tokens and therefore cost more.
Real numbers to ground expectations
Estimates and actuals will differ slightly because token consumption varies by the exact text and model used. Here are a few illustrative data points you might see in the platform:
- An appointment booking bot running on a modern model might estimate roughly 0.004 in LLM charges per message for light back-and-forths. That is far less than the old flat rate for simple messages.
- One example of usage reporting shows 6,583 input tokens and 193 output tokens resulting in a charge of approximately $0.01471 for those actions combined. That demonstrates how longer inputs increase costs while still often remaining under the cap.
These numbers show why short, focused interactions become cheaper and complex actions cost more — and why having visibility into token counts helps optimize workflows.
Features that increase token usage
Not all messages are equal. Certain capabilities add token overhead whenever they are triggered. The features that commonly increase token consumption are:
- Custom variables. Dynamically inserting customer details into prompts means the model processes extra text each time.
- Actions and workflows. When agents run processes or retrigger instructions they reread prompts and context, increasing token use.
- Knowledge base lookups. Pulling in KB entries or attached files such as PDFs forces the model to process additional content.
- Voice and image support. Transcribing audio or interpreting images introduces additional processing steps and therefore more tokens.
Understanding which features you use most helps prioritize optimizations that reduce costs without sacrificing capability.
Common use cases and cost implications
Simple customer confirmations and responses
Use case: appointment confirmations or short status replies. These are generally short and token-light.
Implication: Most of these exchanges will cost substantially less than the old flat fee. If our workflow keeps replies concise, running basic conversational flows becomes significantly cheaper.
Smart assistants that use knowledge bases
Use case: a support bot that references your brand-specific documentation, product manuals, or training PDFs to produce accurate answers.
Implication: Each lookup adds tokens. A single complex response that includes referenced passages can be more expensive than the simple per-message rate, especially once voice or images are involved.
Workflow automation and staff training
Use case: agents that run multi-step processes, prompt for multiple inputs, and produce detailed guidance or reports.
Implication: These require more context and rereading of prompts, which raises token consumption and billable costs. The trade-off is that the agent performs higher-value tasks that would otherwise require human time.
How to view and estimate costs in the platform
We can check token estimates and billing details directly in the platform so decisions are data-driven. Typical steps look like this:
- Open the AI Agents area in the platform.
- Select the conversation AI section and then the agents list.
- Choose the agent you want to inspect and scroll down to the plan or billing section to see whether it uses token-based pricing.
- Under bot goals or a similar configuration panel inspect the model assigned and the approximate token usage. Hovering over information icons usually reveals a breakdown of estimated charges per message for that bot.
- For detailed billing, open your account settings and go to billing or transactions to see input and output tokens and the resulting charges for historical actions.
Seeing input and output tokens on each transaction makes it easy to track what drives cost and to compare different agent designs before rolling them out widely.
Tips to lower token costs without losing value
We can reduce costs by adjusting how agents are built and how they interact with users. Here are practical optimizations that preserve experience while limiting token use:
- Keep prompts and responses concise. Shorter text equals fewer tokens.
- Cache common answers. Use static replies for FAQs instead of generating every answer from scratch.
- Limit knowledge base depth. Only pull essential KB snippets, and avoid loading entire documents unless necessary.
- Use flow-based builders where appropriate. These can avoid outbound message charges while still guiding users through decisions.
- Batch related operations. Combine small lookups into one request rather than multiple back-to-back calls.
- Measure and iterate. Use the token and transaction logs to find high-cost interactions and redesign them.
Protections and rollout timeline
The transition includes safeguards to protect budgets as we adapt:
- Cost cap through December 31, 2025. Even if an interaction consumes many tokens, it will not be charged more than $0.02 per outbound message during this period.
- Flow-based builder outbound fees waived through December 31, 2025 to encourage experimentation and design optimization.
- Admin comparison window. Administrators can enable a feature flag in the Labs or account flags area to compare token-based charges against the flat-rate model through October 31, 2025. After that, token pricing becomes active by default for all subaccounts.
These measures give teams time to test, optimize, and budget for the new model without sudden surprises.
How to prepare your team
Preparing for token-based billing is mostly about visibility and design decisions. We recommend these practical steps:
- Audit existing agents. Identify which interactions are short and can benefit from lower costs, and which are complex and need monitoring.
- Enable cost estimates in the platform and review token estimates for each agent.
- Prioritize optimizations for high-traffic, high-cost agents first.
- Train staff to understand when a human handoff makes more sense than an expensive AI response.
- Set alerts or spend limits to avoid unexpected bills while you get comfortable with token usage patterns.
Practical examples that illustrate savings and trade-offs
Example 1: A local shop uses an agent to confirm store hours and book appointments. That bot exchanges short messages and does not access a knowledge base. Under token pricing, its per-message cost drops well below the previous flat rate. The business saves money while still automating routine tasks.
Example 2: A support team builds an agent that references several product manuals and generates multi-paragraph troubleshooting steps. This increases token consumption. The trade-off is fewer support tickets and faster resolution times, which can justify the higher per-interaction cost.
Example 3: A training bot runs scenario-based role plays with staff, including voice and image analysis. These tasks are token-intensive, and costs increase accordingly. This is appropriate when the agent replaces human hours for specialized training.
What success looks like
Success means balancing cost and impact. For many teams that looks like:
- Lower average cost per interaction for routine tasks
- Clear visibility into where token spend occurs
- Intentional use of higher-cost features when they deliver measurable value
- Predictable budgeting for heavy usage after analyzing token patterns
Testimonials from teams using token-based pricing
Teams report two common outcomes after switching to token visibility. First, simple flows become much cheaper, and the savings add up. Second, when complex agents are necessary, they can justify the expense with improved efficiency and outcomes. The common theme is better control and fewer billing surprises.
Frequently asked questions
What exactly is a token and how does it relate to message length
A token is a unit of text the model processes. Short words and punctuation may each count as tokens. Longer messages use more tokens. Tokens are consumed both when the agent reads input and when it generates output, so total tokens equal input plus output for a single interaction.
Will I ever pay more than the previous flat rate per message
No. Through December 31, 2025 outbound messages will be capped at $0.02 per message. That means even if token usage would normally cost more, the charge per outbound message will not exceed $0.02 during this period.
Which features should I watch because they increase cost
Custom variables, actions and workflows, knowledge base lookups, and voice or image processing commonly raise token consumption. Track interactions that trigger these features and optimize where possible to reduce unnecessary processing.
How can I see estimated costs for a specific agent
Open the agent configuration in the platform, inspect the bot goals or model assignment panel, and review the estimated token usage. Billing or transaction sections of the settings show historical input and output tokens and the resulting charges.
Is there a way to compare the new pricing to the old flat rate
Yes. During the comparison window administrators can enable a feature in account settings to see how token-based charges compare to the flat-rate model before the transition becomes mandatory. Use this period to test and adjust agents as needed.
What if I have high-volume agents that suddenly become expensive
Start by analyzing the token logs to identify what drives the increased usage. Consider simplifying prompts, caching frequent replies, limiting knowledge retrievals, and batching requests. Flow-based builders can also help because outbound message charges for flows are waived until the end of 2025.
Final thoughts
Token-based pricing gives us more control and fairness. It rewards concise design and makes the cost of complex automation transparent. The transition period and cost caps provide breathing room to test and optimize without sudden surprises. By auditing agents, monitoring token usage, and applying simple design principles, we can keep costs low while extending the value of conversational AI across more use cases.
We recommend enabling cost estimates in your account, reviewing high-traffic agents, and prioritizing small design changes that deliver the biggest savings. The move to token-based billing is an opportunity to tighten up conversations, reduce waste, and invest budget where AI delivers the most impact.