Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tryhamsa.com/llms.txt

Use this file to discover all available pages before exploring further.

Hamsa Speech to Text (STT) transcribes Arabic speech across multiple dialects into text with word-level timestamps and speaker identification. Whether you’re transcribing media content, building voice applications, or documenting conversations, Hamsa STT delivers high-accuracy Arabic speech recognition.

Overview

API Reference

Technical API documentation for developers

Quickstart

Get started with STT in minutes

Key features

Arabic dialect recognition

Hamsa STT is optimized for Arabic speech:
  • Automatic dialect detection: Set language to ar and the model detects the dialect automatically
  • Code-switching: Natural handling of mixed Arabic-English speech
  • Colloquial expressions: Recognition of dialect-specific idioms and expressions

Advanced transcription features

  • Word-level timestamps: Precise timing for each transcribed word — each segment includes word text plus start/end times
  • Word highlight during playback: In the Media Platform, the current word highlights in sync with playback; click any word to seek
  • Speaker diarization: Identification of different speakers in multi-speaker audio
  • Automatic punctuation: Natural punctuation and formatting
  • SRT subtitle export: Generate formatted subtitles with configurable line/duration options

Flexible integration

  • Batch API (/v1/jobs/transcribe) — async transcription from media URLs with webhook delivery
  • Realtime API (/v1/realtime/stt) — synchronous transcription from base64-encoded audio
  • WebSocket (/v1/realtime/ws) — streaming transcription for real-time applications
  • Media Platform — web interface for upload, transcribe, and review

API endpoints

Batch API

Async — /v1/jobs/transcribeSubmit a media URL for transcription. Results delivered via webhook.Parameters: mediaUrl, model, language, webhookUrl

Realtime API

Sync — /v1/realtime/sttSend base64-encoded audio, get transcription back directly.Parameters: audioBase64, language, isEosEnabled

Models

Model IDBest for
Hamsa-General-V2.0General-purpose — media, podcasts, pre-recorded content
Hamsa-Conversational-V1.0Conversational audio — meetings, calls, dialogues

Supported languages

The API accepts two language codes:
CodeLanguage
arArabic (all dialects — auto-detected)
enEnglish
Arabic dialect detection is automatic — you do not need to specify the specific dialect. Set language to ar and the model handles Egyptian, Gulf, Levantine, Iraqi, and other dialects.

Use cases

Media transcription

Transcribe Arabic podcasts, videos, and media content:
  • Generate subtitles for videos (with SRT export)
  • Create searchable transcripts
  • Content analysis and indexing

Voice agents

Power real-time conversational AI:
  • Customer service voice agents
  • Live call transcription
  • Conversation analytics

Meeting documentation

Document Arabic meetings and interviews:
  • Automatic meeting minutes with speaker identification
  • Searchable archives
  • Compliance and record-keeping

Content accessibility

Make Arabic audio content accessible:
  • Closed captions for videos
  • Transcripts for audio content
  • Translation preparation

Getting started

1

Choose your integration

Use the Batch API for pre-recorded media, the Realtime API for direct transcription, or the WebSocket API for streaming.
2

Select a model

Use Hamsa-General-V2.0 for general transcription or Hamsa-Conversational-V1.0 for conversational audio.
3

Submit your audio

Provide a media URL (batch) or base64-encoded audio (realtime), and get your transcription with timestamps and speaker information.

Next steps

Quickstart Guide

Build your first STT integration

WebSocket API

Real-time streaming transcription

Improving Accuracy

Tips for better transcription accuracy

Media Platform

Use STT via web interface

FAQ

The Batch API (/v1/jobs/transcribe) is async — submit a media URL and receive results via webhook. Use it for pre-recorded files. The Realtime API (/v1/realtime/stt) accepts base64-encoded audio and returns the transcription directly. For streaming, use the WebSocket API.
No. Set language to ar and the model automatically detects the specific dialect (Egyptian, Gulf, Levantine, etc.) and transcribes accordingly.
Yes, the models handle speech that switches between Arabic and English, which is common in many Arabic-speaking regions.
Use Hamsa-General-V2.0 for general-purpose transcription of media and pre-recorded content. Use Hamsa-Conversational-V1.0 for conversational audio like calls and meetings.
Yes. Set returnSrtFormat to true in the Batch API request. You can customize subtitle formatting with srtOptions. See the Quickstart for details.