Last updated: June 2026 · By Shash Eran
ElevenLabs API Pricing 2026 — All Tiers, Character Quotas, and Cost vs OpenAI TTS
TL;DR
ElevenLabs API pricing runs from Free (10K chars/mo, test only) through Starter ($5/mo, 30K chars), Creator ($22/mo, 100K chars), Pro ($99/mo, 500K chars), Scale ($330/mo, 2M chars), and Business (custom, 10M+ chars). WebSocket streaming is available from Creator tier up. Voice cloning scales from 1 custom voice (Starter) to 160 voices (Scale). ElevenLabs is more expensive than OpenAI TTS or Google Cloud TTS at equivalent volume — but the quality gap is significant, especially for natural speech and multilingual applications.
June 2026 update
API pricing and tier structure verified June 2026. The core tier lineup is unchanged: Free (10K chars), Starter ($5/mo, 30K chars), Creator ($22/mo, 100K chars), Pro ($99/mo, 500K chars), Scale ($330/mo, 2M chars), Business (custom, 10M+ chars). New additions in 2026: the Voice Design endpoint is now generally available on Creator and above — you describe a voice in natural language and ElevenLabs generates a custom voice without uploading samples. The recommended low-latency model is now eleven_flash_v2_5 (replaces Turbo v2); first-chunk latency over WebSocket is approximately 150–180ms under standard conditions. ElevenLabs also introduced an Audio Isolation endpoint (background noise removal) available on Pro and above at no additional character cost. The OpenAI TTS comparison row in this guide has been updated to reflect the current gpt-4o-mini-tts model.
1. All Six ElevenLabs API Tiers at a Glance
ElevenLabs uses a subscription model where you pay a flat monthly fee for a character quota — the number of characters (letters, spaces, punctuation) you can send to the API per billing period. All tiers include access to the full voice library and REST API access; higher tiers add WebSocket streaming, voice cloning, and increased concurrency.
| Plan | Price/mo | Characters | Custom Voices | Streaming |
|---|---|---|---|---|
| Free | $0 | 10,000 | 0 (no cloning) | REST only |
| Starter | $5 | 30,000 | 1 (IVC only) | REST only |
| Creator | $22 | 100,000 | 10 (IVC + PVC) | WebSocket ✓ |
| Pro ⭐ | $99 | 500,000 | 30 (IVC + PVC) | WebSocket ✓ |
| Scale | $330 | 2,000,000 | 160 (IVC + PVC) | WebSocket ✓ |
| Business | Custom | 10M+ | Custom | WebSocket ✓ |
IVC = Instant Voice Cloning (fast, from short samples). PVC = Professional Voice Cloning (high-fidelity, requires longer samples, processing time 1–24 hours).
2. How Character Quotas Work
ElevenLabs measures API usage in characters — every character you send to the text-to-speech endpoint counts against your monthly quota, including spaces and punctuation. Understanding this helps you accurately estimate your tier requirements.
Character Quota Reference
- Average word: ~5 characters
- 1 minute of speech: ~800–1,000 characters
- 10-minute audio: ~8,000–10,000 characters
- 30-minute podcast episode: ~24,000–30,000 characters
- Full audiobook chapter (~5K words): ~25,000 characters
- Free (10K): ~1 hour of audio/mo
- Starter (30K): ~3 hours of audio/mo
- Creator (100K): ~10 hours of audio/mo
- Pro (500K): ~50 hours of audio/mo
- Scale (2M): ~200 hours of audio/mo
Characters do not roll over between billing periods. If you consistently use less than your quota, you should consider downgrading — unused characters are money left on the table. If you're regularly hitting overages, upgrading to the next tier almost always has a lower cost-per-character than paying overage rates.
3. Cost-Per-Character Math at Each Tier
To compare tiers on efficiency, here's the effective cost per 1,000 characters at each tier's base quota:
| Plan | Price/mo | Characters | $/1K chars | $/hour audio |
|---|---|---|---|---|
| Starter | $5 | 30K | $0.167 | ~$1.50 |
| Creator | $22 | 100K | $0.220 | ~$1.98 |
| Pro | $99 | 500K | $0.198 | ~$1.78 |
| Scale | $330 | 2M | $0.165 | ~$1.49 |
The Creator tier appears most expensive per character on paper, but it's the entry point for WebSocket streaming and Professional Voice Cloning — features that are essential for production applications. The Pro tier hits the best balance of per-character efficiency, feature unlocks, and quota depth for most serious developers and production apps.
4. WebSocket Streaming — Which Tiers Support It and Why It Matters
WebSocket streaming lets your application receive audio data as it's generated — character by character, in real time — rather than waiting for the entire audio file to be synthesized before playback begins. This is the difference between a conversational AI that responds with sub-second audio latency and one that makes users wait 3–5 seconds per response.
WebSocket streaming is available from Creator tier ($22/mo) and above. The Free and Starter tiers only support REST API, which uses a request-response model — you send text, wait for synthesis, receive the audio file. REST is fine for batch audio generation (podcast production, audiobook creation) but is not suitable for real-time conversational applications.
- Voice chatbots and conversational AI assistants
- Real-time phone call voice AI (IVR, sales automation)
- Live streaming with AI voice narration
- Interactive games and virtual characters that respond to player input
- Accessibility tools that read content aloud as it's generated
- Batch podcast production (synthesize full scripts offline)
- Audiobook creation from manuscript text
- E-learning content narration (pre-recorded)
- Marketing video voiceovers
- Content localization (translating and voicing articles)
5. Voice Cloning Tiers and Limits
ElevenLabs offers two types of voice cloning: Instant Voice Cloning (IVC) and Professional Voice Cloning (PVC). Understanding the difference — and which tier you need — is essential if custom voices are part of your use case.
- Create a voice from as little as 1 minute of audio
- Processing is near-instant (seconds)
- Quality: good but perceptibly different from the source speaker at close inspection
- Use case: quick prototyping, consistent character voices, internal tools
- Available from: Starter tier (1 voice)
- Requires 30+ minutes of clean, high-quality audio samples
- Processing takes 1–24 hours
- Quality: near-indistinguishable from original speaker in many cases
- Use case: brand voice, celebrity partnerships, publisher content monetization, executive communications
- Available from: Creator tier ($22/mo) and above
| Plan | Custom Voices | Cloning Type |
|---|---|---|
| Free | 0 | No cloning |
| Starter | 1 | IVC only |
| Creator | 10 | IVC + PVC |
| Pro | 30 | IVC + PVC |
| Scale | 160 | IVC + PVC |
| Business | Custom | IVC + PVC + Custom Models |
6. ElevenLabs vs OpenAI TTS vs Google Cloud TTS vs Amazon Polly
ElevenLabs is not the cheapest TTS API available — it's the highest quality. Here's an honest comparison against the main alternatives at equivalent character volumes:
| Provider | Cost/1M chars | Voice Quality | Voice Cloning | Streaming |
|---|---|---|---|---|
| ElevenLabs Pro | ~$198 | Best-in-class | Yes (IVC + PVC) | WebSocket |
| OpenAI TTS (gpt-4o-mini-tts) | ~$60 | Good, natural | Limited | Streaming |
| Google Cloud TTS Neural2 | ~$16 | Good, robotic edge | No | Streaming |
| Amazon Polly Neural | ~$16 | Decent, flat delivery | No | Streaming |
When to choose ElevenLabs over cheaper alternatives:
- You need emotionally expressive voice — joy, urgency, sadness — not just neutral reading
- You need voice cloning (custom speaker voices)
- Your use case involves long-form listening (audiobooks, podcasts) where listener fatigue from robotic voices is a real concern
- You need high-quality multilingual support — ElevenLabs supports 29+ languages with genuine naturalness
- The end product's quality directly affects your revenue (premium content, brand voice)
When Google TTS or Amazon Polly is sufficient: Internal tools, notifications, accessibility features where cost matters more than quality, high-volume utility applications where listeners won't spend more than 30 seconds listening at a time.
7. Which Tier Is Right for You?
10K characters is roughly 1 hour of audio. This is enough to evaluate voice quality and test your API integration, but not for any production use case. The Free tier has rate limits and no voice cloning.
30K chars/mo is ~3 hours of audio. Good for a solo developer building personal tools, small-scale content production (1–2 short podcast episodes/mo), or experimentation. No streaming support limits production app use.
100K chars/mo (~10 hours), WebSocket streaming, 10 custom voices, PVC access. This is the minimum viable tier for production voice AI applications. Real-time chatbots, small SaaS products, early-stage startups.
500K chars/mo (~50 hours), 30 custom voices, all streaming features. This is the sweet spot for most production applications — enough volume to run a real product, sufficient voice slots for most use cases, and the per-character rate is better than Creator.
2M chars/mo (~200 hours), 160 custom voices. For audio-first products with significant user bases, content publishers generating large volumes of spoken content, or platforms offering voice features as a core product component.
10M+ chars/mo, custom voice limits, dedicated SLA, priority support, and potentially custom model training. Contact ElevenLabs sales directly for pricing. For media companies, enterprise software, and platforms where voice is mission-critical infrastructure.
Start Building with ElevenLabs API
Free tier available — 10K characters to test. Upgrade when you're ready to go live.
WebSocket streaming, voice cloning, 29+ languages — the highest-quality TTS API available.
Try ElevenLabs Free →Frequently asked questions
How much does the ElevenLabs API cost?
ElevenLabs API pricing tiers: Free ($0, 10K chars/mo — test only), Starter ($5/mo, 30K chars), Creator ($22/mo, 100K chars), Pro ($99/mo, 500K chars), Scale ($330/mo, 2M chars), Business (custom, 10M+ chars). All paid tiers include the full voice library and REST API. WebSocket streaming starts at Creator tier.
Is ElevenLabs API cheaper than OpenAI TTS?
No — ElevenLabs is more expensive per character than OpenAI TTS (gpt-4o-mini-tts at ~$60/1M chars vs ElevenLabs Pro at ~$198/1M chars). However, ElevenLabs delivers significantly higher voice quality, emotional range, and voice cloning capabilities that OpenAI TTS doesn't match. For applications where voice quality is a core product differentiator, the premium is usually justified.
Does ElevenLabs API support WebSocket streaming?
Yes — WebSocket streaming is available from Creator tier ($22/mo) and above. Streaming allows your application to receive audio data as it's generated in real time, enabling sub-second latency for conversational AI, voice chatbots, and interactive applications. Free and Starter tiers support REST API only (request-response), which has higher latency and is not suitable for real-time applications.
How many custom voices can I clone with the ElevenLabs API?
Voice cloning limits by tier: Free (0 custom voices), Starter (1 voice, IVC only), Creator (10 voices, IVC + PVC), Pro (30 voices, IVC + PVC), Scale (160 voices, IVC + PVC), Business (custom). Instant Voice Cloning creates a voice from ~1 minute of audio in seconds. Professional Voice Cloning requires 30+ minutes of clean samples and produces near-indistinguishable voice replicas with 1–24 hours of processing time.
What is ElevenLabs' character quota and how does it work?
ElevenLabs measures usage in characters — every character you send to the API (including spaces and punctuation) counts toward your monthly quota. A typical spoken word averages ~5 characters. 100K characters is roughly 20,000 spoken words (~2–3 hours of audio). Unused characters do not roll over at the end of each billing cycle. Exceeding your quota triggers per-character overage charges, which are more expensive than the bundled rate — upgrading to the next tier is usually more cost-effective than paying overages.
Written by Shash
Founder of Infinfy Solutions · Real tools, real client work.