Documentation Index
Fetch the complete documentation index at: https://docs.tryhamsa.com/llms.txt
Use this file to discover all available pages before exploring further.
Learn about the models that power the Hamsa API.
Flagship models
Text to Speech
Jobs API
Async TTS via
/v1/jobs/text-to-speechNatural-sounding output optimized for Arabic dialects
Multiple Arabic dialects + English
Async job-based — result delivered via webhook
Realtime API
Sync TTS via
/v1/realtime/ttsLow latency — returns WAV audio directly
Arabic dialects + English
Optimized for conversational AI and voice agents
Speech to Text
Batch API
Async STT via
/v1/jobs/transcribeHigh accuracy transcription for Arabic dialects
Word-level timestamps
Speaker diarization support
Async job-based — result delivered via webhook
Realtime API
Sync STT via
/v1/realtime/sttArabic dialects + English
Base64-encoded audio input
Returns transcription directly
End-of-speech detection
Models overview
The Hamsa API offers audio processing optimized for Arabic language, with support for multiple dialects and English.| Endpoint | Description | Languages |
|---|---|---|
/v1/jobs/text-to-speech | Async TTS — job-based with webhook delivery | Arabic dialects, English |
/v1/realtime/tts | Sync TTS — returns WAV audio directly | Arabic dialects, English |
/v1/jobs/transcribe | Async STT — job-based with webhook delivery | Arabic, English |
/v1/realtime/stt | Sync STT — returns transcription directly | Arabic, English |
Hamsa TTS — Jobs API
The Jobs API (/v1/jobs/text-to-speech) is an async TTS endpoint. It creates a job and delivers the audio result via webhook. Best for batch processing and media content generation.
Use cases:
- Content Creation: Generate Arabic audio content, podcasts, and videos
- Accessibility: Audio versions of written Arabic content
- E-Learning: Educational content in Arabic with natural pronunciation
- Media Production: Professional-quality voiceovers
text, voiceId, webhookUrl, webhookAuth
→ See the TTS Quickstart for examples.
Hamsa TTS — Realtime API
The Realtime API (/v1/realtime/tts) returns WAV audio directly in the response. Designed for real-time applications and voice agents.
Use cases:
- Voice Agents: Real-time voice agents and phone calls
- Interactive Applications: Chatbots requiring immediate voice response
- Live Conversations: Conversational AI applications
text, speaker, dialect, mulaw
Supported dialects
| Code | Dialect | Example voices |
|---|---|---|
pls | Palestinian | Amjad, Layan |
egy | Egyptian | Mariam, Samir |
syr | Syrian | Dalal, Mais |
irq | Iraqi | Lyali, Fatma |
jor | Jordanian | Lana, Jasem |
leb | Lebanese | Carla, Majd |
ksa | Saudi | Hiba, Fahd |
uae | Emirati | Salma, Dima |
bah | Bahraini | Mazen, Ruba |
qat | Qatari | Deema, Faisal |
kuw | Kuwaiti | Mai, Hatem |
oma | Omani | Aisha, Jaber |
msa | Modern Standard Arabic | Salem, Tamim |
ar-sa | Arabic – Gulf | Khalid, Rahma |
en | English | Emma, James |
Hamsa STT — Batch API
The Batch API (/v1/jobs/transcribe) is an async STT endpoint. Submit a media URL and receive the transcription via webhook or polling. Choose from two models:
| Model ID | Best for |
|---|---|
Hamsa-General-V2.0 | General-purpose — media, podcasts, pre-recorded content |
Hamsa-Conversational-V1.0 | Conversational audio — meetings, calls, dialogues |
- Transcription Services: Convert Arabic audio/video content to text
- Meeting Documentation: Capture and document Arabic conversations with speaker identification
- Media Subtitling: Generate SRT subtitles for Arabic media content
- Content Analysis: Process and index Arabic audio content
- Word-level timestamps for each transcribed segment
- Speaker diarization for multi-speaker audio
- Automatic Arabic dialect detection (set
languagetoar) - SRT subtitle export with configurable formatting
- Automatic punctuation and formatting
mediaUrl, model, language, webhookUrl, returnSrtFormat, srtOptions
→ See the STT Quickstart for examples.
Hamsa STT — Realtime API
The Realtime API (/v1/realtime/stt) accepts base64-encoded audio and returns the transcription directly. For streaming, use the WebSocket API.
Use cases:
- Voice Agents: Real-time speech recognition for conversational AI
- Live call transcription: Transcribe Arabic calls in real time
- Interactive applications: Immediate transcription for chatbots and voice interfaces
- Synchronous — returns transcription in the response
- End-of-speech detection with configurable threshold
- Arabic and English language support
audioBase64, language, isEosEnabled, eosThreshold
→ See the STT Quickstart for examples.
Model selection guide
Requirements
Requirements
Batch / media content
Use the Jobs API (
/v1/jobs/text-to-speech) for async processing with webhook delivery.Real-time / voice agents
Use the Realtime API (
/v1/realtime/tts) or WebSocket for low-latency streaming.Arabic Dialects
Both TTS endpoints support 15 Arabic dialects + English. Choose based on latency requirements.
Use case
Use case
Content creation
Use the Jobs API for professional Arabic content, media, and video narration.
Voice Agents
Use the Realtime API / WebSocket for real-time conversational applications.
Transcription
Use the Batch API (
/v1/jobs/transcribe) with Hamsa-General-V2.0 for media transcription or Hamsa-Conversational-V1.0 for conversational audio.Character limits
| Endpoint | Character limit |
|---|---|
| WebSocket TTS | 2,000 characters per message |
For longer content, consider splitting the input into multiple requests.
Audio duration limits
| Endpoint | Audio duration limit | File size limit |
|---|---|---|
Batch API (/v1/jobs/transcribe) | 60 minutes | 500 MB |
Realtime API (/v1/realtime/stt) | Per-request | N/A |
WebSocket (/v1/realtime/ws) | Streaming | N/A |
Plans and Usage Limits
Your subscription plan determines your monthly usage limits and concurrent call capacity.Plan Comparison
| Plan | Price | Credits | Voice Agent | Speech to Text | Text to Speech | Concurrency | KB Storage |
|---|---|---|---|---|---|---|---|
| Free | $0/mo | 50 | 9 min | 50 min | 25 min | 1 | 1 MB |
| Starter | $5/mo | 100 | 17 min | 100 min | 50 min | 1 | 5 MB |
| Creator | $15/mo | 500 | 84 min | 500 min | 250 min | 2 | 10 MB |
| Pro | $100/mo | 5,000 | 834 min | 5,000 min | 2,500 min | 5 | 50 MB |
| Business | $320/mo | 20,000 | 3,334 min | 20,000 min | 10,000 min | 10 | 100 MB |
| Enterprise | Custom | Custom | Unlimited | Unlimited | Unlimited | Unlimited | 300 MB |
Plan Features
| Feature | Free | Starter | Creator | Pro | Business | Enterprise |
|---|---|---|---|---|---|---|
| Access to All Models | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Fine-tuned AI Models | - | - | - | - | ✓ | ✓ |
| Basic Cloud Support | - | - | - | ✓ | - | - |
| Full Cloud Support | - | - | - | - | ✓ | ✓ |
| On-Premise Solution | - | - | - | - | - | ✓ |
To increase your usage limits & concurrent calls, upgrade your subscription plan.Enterprise customers can request custom limits by contacting sales.