Hamsa Speech to Text (STT) transcribes Arabic speech across multiple dialects into text with word-level timestamps and speaker identification. Whether you’re transcribing media content, building voice applications, or documenting conversations, Hamsa STT delivers high-accuracy Arabic speech recognition.Documentation Index
Fetch the complete documentation index at: https://docs.tryhamsa.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
API Reference
Technical API documentation for developers
Quickstart
Get started with STT in minutes
Key features
Arabic dialect recognition
Hamsa STT is optimized for Arabic speech:- Automatic dialect detection: Set
languagetoarand the model detects the dialect automatically - Code-switching: Natural handling of mixed Arabic-English speech
- Colloquial expressions: Recognition of dialect-specific idioms and expressions
Advanced transcription features
- Word-level timestamps: Precise timing for each transcribed word — each segment includes word text plus start/end times
- Word highlight during playback: In the Media Platform, the current word highlights in sync with playback; click any word to seek
- Speaker diarization: Identification of different speakers in multi-speaker audio
- Automatic punctuation: Natural punctuation and formatting
- SRT subtitle export: Generate formatted subtitles with configurable line/duration options
Flexible integration
- Batch API (
/v1/jobs/transcribe) — async transcription from media URLs with webhook delivery - Realtime API (
/v1/realtime/stt) — synchronous transcription from base64-encoded audio - WebSocket (
/v1/realtime/ws) — streaming transcription for real-time applications - Media Platform — web interface for upload, transcribe, and review
API endpoints
Batch API
Async —
/v1/jobs/transcribeSubmit a media URL for transcription. Results delivered via webhook.Parameters: mediaUrl, model, language, webhookUrlRealtime API
Sync —
/v1/realtime/sttSend base64-encoded audio, get transcription back directly.Parameters: audioBase64, language, isEosEnabledModels
| Model ID | Best for |
|---|---|
Hamsa-General-V2.0 | General-purpose — media, podcasts, pre-recorded content |
Hamsa-Conversational-V1.0 | Conversational audio — meetings, calls, dialogues |
Supported languages
The API accepts two language codes:| Code | Language |
|---|---|
ar | Arabic (all dialects — auto-detected) |
en | English |
language to ar and the model handles Egyptian, Gulf, Levantine, Iraqi, and other dialects.
Use cases
Media transcription
Transcribe Arabic podcasts, videos, and media content:- Generate subtitles for videos (with SRT export)
- Create searchable transcripts
- Content analysis and indexing
Voice agents
Power real-time conversational AI:- Customer service voice agents
- Live call transcription
- Conversation analytics
Meeting documentation
Document Arabic meetings and interviews:- Automatic meeting minutes with speaker identification
- Searchable archives
- Compliance and record-keeping
Content accessibility
Make Arabic audio content accessible:- Closed captions for videos
- Transcripts for audio content
- Translation preparation
Getting started
Choose your integration
Use the Batch API for pre-recorded media, the Realtime API for direct transcription, or the WebSocket API for streaming.
Select a model
Use
Hamsa-General-V2.0 for general transcription or Hamsa-Conversational-V1.0 for conversational audio.Next steps
Quickstart Guide
Build your first STT integration
WebSocket API
Real-time streaming transcription
Improving Accuracy
Tips for better transcription accuracy
Media Platform
Use STT via web interface
FAQ
What's the difference between the Batch API and Realtime API?
What's the difference between the Batch API and Realtime API?
The Batch API (
/v1/jobs/transcribe) is async — submit a media URL and receive results via webhook. Use it for pre-recorded files. The Realtime API (/v1/realtime/stt) accepts base64-encoded audio and returns the transcription directly. For streaming, use the WebSocket API.Do I need to specify the Arabic dialect?
Do I need to specify the Arabic dialect?
No. Set
language to ar and the model automatically detects the specific dialect (Egyptian, Gulf, Levantine, etc.) and transcribes accordingly.Can the model handle Arabic-English code-switching?
Can the model handle Arabic-English code-switching?
Yes, the models handle speech that switches between Arabic and English, which is common in many Arabic-speaking regions.
Which model should I use?
Which model should I use?
Use
Hamsa-General-V2.0 for general-purpose transcription of media and pre-recorded content. Use Hamsa-Conversational-V1.0 for conversational audio like calls and meetings.Can I get SRT subtitles?
Can I get SRT subtitles?
Yes. Set
returnSrtFormat to true in the Batch API request. You can customize subtitle formatting with srtOptions. See the Quickstart for details.