AIMultiple ResearchAIMultiple ResearchAIMultiple Research
We follow ethical norms & our process for objectivity.
This research is not funded by any sponsors.
APILLM
Updated on Apr 7, 2025

LLM Pricing: Top 15+ Providers Compared in 2025

We analyzed 15+ LLMs and their pricing and performance. LLM API pricing can be complex and depends on your preferred usage. If you plan to use:

Last Updated at 04-28-2025
Model Input
Price
Output
Price
Context
Length
Max Output
Tokens
Arena
Score
Google Gemini-2.5-Pro $2.50 $15.00 1,000k 65k 1,439
OpenAI gpt-4o $2.50 $10.00 128k 16k 1,408
xAI Grok-3-Preview $3.00 $15.00 131k n/a 1,402
OpenAI gpt-4.5 $75.00 $150.00 128k 16k 1,398
Google Gemini-2.0-Flash-Thinking n/a n/a 1,000k 64k 1,380
DeepSeek DeepSeek-R1 $0.55 $2.19 64k 8k 1,358
Google Gemini-2.0-Flash-001 $0.10 $0.40 1,000k 8k 1,354
OpenAI o1-2024-12-17 $15.00 $60.00 200k 100k 1,350
Google Gemma-3-27B-it n/a n/a 128k 8k 1,342
Alibaba Qwen2.5-Max $1.60 $6.40 32k 8k 1,340
OpenAI o1-preview $15.00 $60.00 128k 32k 1,335
OpenAI o3-mini-high $1.10 $4.40 200k 100k 1,325
DeepSeek DeepSeek-V3 $0.27 $1.10 64k 8k 1,318
Alibaba Qwen-Plus-0125 $0.40 $1.20 131k 8k 1,310
OpenAI o3-mini $1.10 $4.40 200k 100k 1,305
Cohere Command A $2.50 $10.00 256k 8k 1,305
OpenAI o1-mini $1.10 $4.40 128k 65k 1,304
Anthropic Claude 3.7 Sonnet $3.00 $15.00 200k 128k 1,292

Hover over model names to see their full names and over headers to see explanations about the columns.

Ranking: The ranking of models is based on the Overall Arena Score in the Chatbot Arena LLM leaderboard.1

When the official Grok 3 API is released and Gemini’s experimental models exit their experimental phase, the missing values will be populated with the official values.

Rate limits control Google API request frequency for free tiers.2

You can check the hallucination rates and reasoning performance of top LLMs in our benchmarks.

Understanding LLM Pricing

Tokens: The Fundamental Unit of Pricing

Figure 1. Example of tokenization using the GPT-4o & GPT-4o mini tokenizer for the sentence “Identify New Technologies, Accelerate Your Enterprise.”3

While providers offer a variety of pricing structures, per-token pricing is the most common. Tokenization methods differ across models; examples include:

  • Byte-Pair Encoding (BPE): Splits words into frequent subword units, balancing vocabulary size and efficiency.4
    • Example: “unbelievable” → [“un”, “believ”, “able”]
  • WordPiece: Similar to BPE but optimizes for language model likelihood, used in BERT.5
    • Example: “tokenization” → [“token”, “##ization”]. “token” is a standalone word; “##ization” is a suffix.
  • SentencePiece: Tokenizes text without relying on spaces, effective for multilingual models like T5.6
    • Example: “natural language” → [” natural”, ” lan”, “guage”] or [” natu”, “ral”, ” language”].

Please note that the exact subwords depend on the training data and BPE/WordPiece process. To better understand these tokenization methods, watch the video below:

After grasping tokenization, an average price can be estimated based on project token length. Table 2 outlines token ranges by content type—such as UI prompts, email snippets, marketing blogs, detailed reports, and research papers—while noting that token counts vary across models. Once a model is chosen, its tokenizer can be used to have an idea of the average token count for the content.

Last Updated at 02-17-2025
Content TypeWord Count Range (words)Token Count Range (tokens)Typical Enterprise Use Cases
Sentence10–2015–35UI prompts, notifications, chatbot responses
Paragraph75–150100–225Email snippets, product descriptions, help texts
Short Article400–600520–900Marketing blogs, press releases, case studies
Long Article900–1,1001,200–1,650Detailed reports, whitepapers, internal knowledge bases
Research Paper4,500–5,500 5,850–8,250Academic publications, R&D documents, technical whitepapers

Table 2. Typical content types, their size ranges, and enterprise considerations (ranges are estimates and may vary).

Context Window Implications

Awareness of the context window concept is another crucial factor to consider regarding pricing. Here, it is essential to ensure that the total number of tokens from both the input and output does not exceed the context window/length.

If the total exceeds the context window, it may lead to the truncation of the excess output, as shown in Figure 2. Therefore, the output may not be as expected. It is important to note that tokens generated during the reasoning process are also counted within this limitation.

Figure 2. Illustration of context window limitations leading to output truncation in a multi-turn conversation.7

Max Output Tokens

This is an important parameter in Large Language Models (LLMs) to achieve the desired output and manage costs effectively. While many documentations mention that it can be adjusted using the max_tokens parameter, it is crucial to review the documentation of the specific API being used to identify the correct parameter. It should be adjusted according to the specific needs:

If set too low: It may result in incomplete outputs, causing the model to cut off responses before delivering the full answer.

If set too high: Depending on the temperature (a parameter that controls response creativity) setting, it can lead to unnecessarily verbose outputs, longer response times, and increased cost.

Therefore, it is a parameter that requires careful consideration to optimize resource usage while balancing output quality, cost, and performance.

Last Updated at 02-07-2025
Content TypeInput Prompt ExampleInput Token Count*Assumed Output Token Count*
Sentence“Generate a friendly notification message reminding users to complete their profile within the app.”1525
Paragraph“Write a concise email snippet announcing the launch of our new product feature, highlighting its key benefit.”19162
Short Article“Create a short blog post explaining how our new software solution improves remote team productivity.”18710
Long Article“Draft a comprehensive whitepaper outlining the impact of AI on the future of supply chain management, including real-world case studies.”241,425
Research Paper“Write a comprehensive full-length research paper on the application of machine learning algorithms in geological data analysis, covering background, literature review, theoretical framework, methodology, results, discussion, and referencing recent studies.”267,050

Table 3. Example input prompts and estimated token counts per content type.

*This assumes that each model produces responses with an equal number of output tokens—although the token count for both input and output may vary depending on each model’s tokenization, the number has been kept constant here for each model.

The LLM API price calculator can be used to determine the total cost per model when generating the content types from Table 2 via the API using the sample prompts provided in Table 3, as well as for calculating costs for custom cases beyond the suggested content types.

LLM API Price Calculator

You can calculate your total cost by filling out these 3 values below and sort the results by input cost, output cost, total cost, or alphabetically in increasing or decreasing order:

The default ranking is based on the total cost.

Comparing LLM Subscription Plans

Non-technical users may prefer to use the UI rather than the API, here is the UI pricing:

Last Updated at 02-21-2025
PlanPriceKey FeaturesLimitations

Microsoft Copilot/Free

$0

Basic Microsoft app integration
Multiple devices and platforms
15 boosts per day

Limited credits for AI usage (includes designer only)
Preferred model access only during non-peak times
Limited Copilot Voice usage

Microsoft Copilot/Pro$20/month

Preferred model access
100 boosts per day
Full Microsoft 365 integration
Early feature access
All Microsoft app support

Individual use only
Google Gemini/Basic

$0

2.0 Flash model access
Basic writing & images
Google apps integration
Voice conversations

Basic model only, Limited features
Google Gemini/Advanced$19.99/month

2.0 Pro experimental model
Deep research capability
1500-page document analysis
2TB storage
Custom AI experts (Gems)
Code smarter

Mistral AI/Free

$0

Web browsing & news
Basic file analysis
Image generation
Flash answers

Training data usage, Limited capabilities
Mistral AI/Pro$14.99/month

Unlimited web browsing
Extended analysis
Opt-out data sharing
Dedicated support

Individual use only
Mistral AI/Team

$19.99 per user / month billed annualy
$24.99 per user / month billed monthly

Central billing
API credits
Data excluded from training
Advanced features

Minimum 2 members
Mistral AI/EnterpriseCustom

Secure deployment in your environment
Enhanced support
Granular admin controls
Detailed analytics

OpenAI/Free

$0

GPT-4o mini access
Standard voice
Limited uploads
Basic images

Usage caps, Basic models only
OpenAI/Plus$20/month

Extended limits
Advanced voice
Beta features
Limited GPT-4

Individual use, Usage must follow policies
OpenAI/Pro$200/month

Unlimited access to o1, o1-mini, GPT-4o, and voice (audio only)
Higher limits for video and screensharing in voice
Access to o1 Pro mode
Extended access to Sora
Access to Operator research preview (U.S. Only)

Usage must be reasonable and comply with their policies
OpenAI/Team

$25 per user / month billed annualy
$30 per user / month billed monthly

Higher message limits than Plus on GPT-4, GPT-4o, and tools like DALL·E
Standard and advanced voice mode
Admin console for workspace management
Team data excluded from training by default

Minimum 2 members
OpenAI/EnterpriseCustom

High speed access to GPT-4, GPT-4o, GPT-4o mini, and tools like DALL E
Expanded context window
Enterprise data excluded from training by default & custom data retention windows
Admin controls, domain verification, and analytics,
Enhanced support

Claude.ai/Free

$0

Web/mobile access
Basic analysis
One latest model
Document upload

Limited usage, Basic features only
Claude.ai/Pro

$18/month billed annualy
$20/month billed monthly

Claude 3.5 Sonnet & Opus access
Projects organization
Increased usage
Early features

Individual use only
Claude.ai/Team

$25 per user / month billed annualy
$30 per user / month billed monthly

Central billing
Collaboration tools
Expanded usage
Admin features

5 member minimum
Claude.ai/EnterpriseCustom

Expanded context window
Single sign-on (SSO) and domain capture
Role-based access
System for Cross-domain Identity Management (SCIM)
Audit logs
Data source integrations

Using Multiple Language Models

A tool like OpenRouter allows the same prompt to be sent simultaneously to multiple models. The responses, token consumption, response time, and pricing can then be compared to determine which model is most suitable for the task.

Figure 3. Interface showcasing a prompt sent to multiple Large Language Models (LLMs), including R1, Mistral Small 3, GPT-4o-mini, and Claude 3.5 Sonnet.8

Benefits and Challenges

  • Increased Adaptability and Efficiency: Orchestration enhances responsiveness, allowing for real-time assessment of model efficiency, leading to the identification of a cost-effective model and potential savings.
  • Prompt Sensitivity and Optimization: Identical prompts can elicit vastly different outputs across models, necessitating prompt engineering tailored to each model to achieve desired results, adding to development and maintenance complexity.

FAQ

What is LLM API Pricing?

Accessing Large Language Models (LLMs) via an Application Programming Interface (API) grants you remote access to AI models. This access is subject to a fee, often called an “API fee,” charged by the service provider. This fee is a critical consideration when integrating LLMs into your applications. It essentially represents the cost associated with each query, request, or task performed through the provider’s API. Because pricing structures can vary widely – based on factors like token usage, API call volume, feature utilization, or subscription models – understanding how providers calculate these costs is essential. With this knowledge, you can make well-informed decisions, selecting the LLM model and provider that best balance your performance needs, desired functionality, and budgetary limitations.

Why is LLM API pricing complex?

LLM API pricing can be complex due to factors like token consumption, context length, and model choice. Tokenization procedures vary across models, with some using Byte-Pair Encoding (BPE), WordPiece, or SentencePiece—each influencing how text is split into tokens and impacting cost efficiency. Understanding these differences helps optimize API usage and pricing.

What factors determine the cost of using a large language model (LLM)?

LLM costs are primarily determined by token usage (both input and output), API call volume, and the specific pricing model (e.g., per-token, subscription).

How can I compare pricing across different LLM models?

Compare input and output token prices, context window limits, and any additional fees. Tools like OpenRouter allow you to send the same prompt to multiple models and directly compare their results, token usage, speed, and pricing. Consider your typical content length and usage patterns to estimate overall costs.

What is the difference between input tokens and output tokens?

Input tokens are the tokens in the prompt you send to the LLM, while output tokens are the tokens in the generated response. For reasoning models, it’s important to note that tokens generated during the reasoning process itself are also counted as output tokens, impacting the final cost. Both input and output contribute to the overall cost.

How does the text volume I request affect the processing response time and overall budget when using an LLM API?

Larger text requests require more processing, increasing response time and costs. Optimize input sizes and use an LLM API pricing calculator to estimate token counts and manage your budget effectively.

What resources are available to the LLM community to support understanding and optimizing LLM pricing information?

The LLM community has developed various tools and benchmarks to help users understand and optimize LLM pricing. These resources often include calculators and comparison charts that offer insights into the power and efficiency of different models. Platforms like Hugging Face and GitHub host tools and code developed by the community to analyze model performance and costs. Many services offer community support through forums or chat features.

Share This Article
MailLinkedinX
Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 55% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

Next to Read

Comments

Your email address will not be published. All fields are required.

0 Comments