Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tryhamsa.com/llms.txt

Use this file to discover all available pages before exploring further.

Learn about the models that power the Hamsa API.

Flagship models

Text to Speech

Jobs API

Async TTS via /v1/jobs/text-to-speech
Natural-sounding output optimized for Arabic dialects
Multiple Arabic dialects + English
Async job-based — result delivered via webhook

Realtime API

Sync TTS via /v1/realtime/tts
Low latency — returns WAV audio directly
Arabic dialects + English
Optimized for conversational AI and voice agents

Speech to Text

Batch API

Async STT via /v1/jobs/transcribe
High accuracy transcription for Arabic dialects
Word-level timestamps
Speaker diarization support
Async job-based — result delivered via webhook

Realtime API

Sync STT via /v1/realtime/stt
Arabic dialects + English
Base64-encoded audio input
Returns transcription directly
End-of-speech detection

Models overview

The Hamsa API offers audio processing optimized for Arabic language, with support for multiple dialects and English.
EndpointDescriptionLanguages
/v1/jobs/text-to-speechAsync TTS — job-based with webhook deliveryArabic dialects, English
/v1/realtime/ttsSync TTS — returns WAV audio directlyArabic dialects, English
/v1/jobs/transcribeAsync STT — job-based with webhook deliveryArabic, English
/v1/realtime/sttSync STT — returns transcription directlyArabic, English

Hamsa TTS — Jobs API

The Jobs API (/v1/jobs/text-to-speech) is an async TTS endpoint. It creates a job and delivers the audio result via webhook. Best for batch processing and media content generation. Use cases:
  • Content Creation: Generate Arabic audio content, podcasts, and videos
  • Accessibility: Audio versions of written Arabic content
  • E-Learning: Educational content in Arabic with natural pronunciation
  • Media Production: Professional-quality voiceovers
Parameters: text, voiceId, webhookUrl, webhookAuth → See the TTS Quickstart for examples.

Hamsa TTS — Realtime API

The Realtime API (/v1/realtime/tts) returns WAV audio directly in the response. Designed for real-time applications and voice agents. Use cases:
  • Voice Agents: Real-time voice agents and phone calls
  • Interactive Applications: Chatbots requiring immediate voice response
  • Live Conversations: Conversational AI applications
Parameters: text, speaker, dialect, mulaw

Supported dialects

CodeDialectExample voices
plsPalestinianAmjad, Layan
egyEgyptianMariam, Samir
syrSyrianDalal, Mais
irqIraqiLyali, Fatma
jorJordanianLana, Jasem
lebLebaneseCarla, Majd
ksaSaudiHiba, Fahd
uaeEmiratiSalma, Dima
bahBahrainiMazen, Ruba
qatQatariDeema, Faisal
kuwKuwaitiMai, Hatem
omaOmaniAisha, Jaber
msaModern Standard ArabicSalem, Tamim
ar-saArabic – GulfKhalid, Rahma
enEnglishEmma, James
→ See the TTS Quickstart for examples.

Hamsa STT — Batch API

The Batch API (/v1/jobs/transcribe) is an async STT endpoint. Submit a media URL and receive the transcription via webhook or polling. Choose from two models:
Model IDBest for
Hamsa-General-V2.0General-purpose — media, podcasts, pre-recorded content
Hamsa-Conversational-V1.0Conversational audio — meetings, calls, dialogues
Use cases:
  • Transcription Services: Convert Arabic audio/video content to text
  • Meeting Documentation: Capture and document Arabic conversations with speaker identification
  • Media Subtitling: Generate SRT subtitles for Arabic media content
  • Content Analysis: Process and index Arabic audio content
Key features:
  • Word-level timestamps for each transcribed segment
  • Speaker diarization for multi-speaker audio
  • Automatic Arabic dialect detection (set language to ar)
  • SRT subtitle export with configurable formatting
  • Automatic punctuation and formatting
Parameters: mediaUrl, model, language, webhookUrl, returnSrtFormat, srtOptions → See the STT Quickstart for examples.

Hamsa STT — Realtime API

The Realtime API (/v1/realtime/stt) accepts base64-encoded audio and returns the transcription directly. For streaming, use the WebSocket API. Use cases:
  • Voice Agents: Real-time speech recognition for conversational AI
  • Live call transcription: Transcribe Arabic calls in real time
  • Interactive applications: Immediate transcription for chatbots and voice interfaces
Key features:
  • Synchronous — returns transcription in the response
  • End-of-speech detection with configurable threshold
  • Arabic and English language support
Parameters: audioBase64, language, isEosEnabled, eosThreshold → See the STT Quickstart for examples.

Model selection guide

Batch / media content

Use the Jobs API (/v1/jobs/text-to-speech) for async processing with webhook delivery.

Real-time / voice agents

Use the Realtime API (/v1/realtime/tts) or WebSocket for low-latency streaming.

Arabic Dialects

Both TTS endpoints support 15 Arabic dialects + English. Choose based on latency requirements.

Content creation

Use the Jobs API for professional Arabic content, media, and video narration.

Voice Agents

Use the Realtime API / WebSocket for real-time conversational applications.

Transcription

Use the Batch API (/v1/jobs/transcribe) with Hamsa-General-V2.0 for media transcription or Hamsa-Conversational-V1.0 for conversational audio.

Character limits

EndpointCharacter limit
WebSocket TTS2,000 characters per message
For longer content, consider splitting the input into multiple requests.

Audio duration limits

EndpointAudio duration limitFile size limit
Batch API (/v1/jobs/transcribe)60 minutes500 MB
Realtime API (/v1/realtime/stt)Per-requestN/A
WebSocket (/v1/realtime/ws)StreamingN/A

Plans and Usage Limits

Your subscription plan determines your monthly usage limits and concurrent call capacity.

Plan Comparison

PlanPriceCreditsVoice AgentSpeech to TextText to SpeechConcurrencyKB Storage
Free$0/mo509 min50 min25 min11 MB
Starter$5/mo10017 min100 min50 min15 MB
Creator$15/mo50084 min500 min250 min210 MB
Pro$100/mo5,000834 min5,000 min2,500 min550 MB
Business$320/mo20,0003,334 min20,000 min10,000 min10100 MB
EnterpriseCustomCustomUnlimitedUnlimitedUnlimitedUnlimited300 MB

Plan Features

FeatureFreeStarterCreatorProBusinessEnterprise
Access to All Models
Fine-tuned AI Models----
Basic Cloud Support-----
Full Cloud Support----
On-Premise Solution-----
To increase your usage limits & concurrent calls, upgrade your subscription plan.Enterprise customers can request custom limits by contacting sales.

API requests per minute vs concurrent requests

It’s important to understand that API requests per minute and concurrent requests are different metrics that depend on your usage patterns. API requests per minute can be different from concurrent requests since it depends on the length of time for each request and how the requests are batched. Example 1: Spaced requests If you had 60 requests per minute that each took 1 second to complete and you sent them each 1 second apart, the max concurrent requests would be 1 and the average would be 1. Example 2: Batched requests However, if you had 60 requests per minute that each took 3 seconds to complete but all fired at once, the max concurrent requests would be 60 and the average would be 3. Since our system cares about concurrency, requests per minute matter less than how long each of the requests take and the pattern of when they are sent.