Documentation Index
Fetch the complete documentation index at: https://docs.tryhamsa.com/llms.txt
Use this file to discover all available pages before exploring further.
Speech to Text Quickstart
Hamsa offers two STT endpoints:
- Batch API (
/v1/jobs/transcribe) — Async job-based. Accepts a media URL and delivers the transcription via webhook or polling.
- Realtime API (
/v1/realtime/stt) — Synchronous. Accepts base64-encoded audio and returns the transcription directly.
Prerequisites
Batch Transcription
Submit a media URL for transcription. Results are delivered asynchronously.
curl -X POST https://api.tryhamsa.com/v1/jobs/transcribe \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"mediaUrl": "https://your-storage.com/audio.mp3",
"model": "Hamsa-General-V2.0",
"language": "ar",
"webhookUrl": "https://your-server.com/webhook"
}'
Realtime Transcription (synchronous)
Send base64-encoded audio and receive the transcription directly.
curl -X POST https://api.tryhamsa.com/v1/realtime/stt \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"audioBase64": "<base64-encoded-wav-audio>",
"language": "ar"
}'
Parameters
Batch API
| Parameter | Type | Required | Description |
|---|
mediaUrl | string (URI) | Yes | URL of the audio/video file to transcribe |
model | string | Yes | Model to use: Hamsa-General-V2.0 or Hamsa-Conversational-V1.0 |
language | string | No | Language code: ar (default) or en |
webhookUrl | string (URI) | No | URL to receive the completed transcription |
webhookAuth | object | No | Authentication for the webhook |
title | string | No | Optional title for the transcription job |
processingType | string | No | Processing type (default: async) |
returnSrtFormat | boolean | No | Return SRT subtitle format (default: false) |
srtOptions | object | No | SRT formatting options (see below) |
SRT Options
When returnSrtFormat is true, you can customize the subtitle formatting:
| Parameter | Type | Default | Description |
|---|
maxLinesPerSubtitle | integer | 2 | Maximum lines per subtitle block |
singleSpeakerPerSubtitle | boolean | true | Keep one speaker per subtitle |
maxCharsPerLine | integer | 42 | Maximum characters per line |
maxMergeableGap | number | 0.3 | Max gap (seconds) to merge segments |
minDuration | number | 0.7 | Minimum subtitle duration (seconds) |
maxDuration | number | 7 | Maximum subtitle duration (seconds) |
minGap | number | 0.04 | Minimum gap between subtitles (seconds) |
Realtime API
| Parameter | Type | Required | Description |
|---|
audioBase64 | string | Yes | Base64-encoded audio data (WAV format) |
language | string | No | Language code: ar (default) or en |
isEosEnabled | boolean | No | Enable end-of-speech detection (default: false) |
eosThreshold | number | No | End-of-speech detection threshold, 0.0–1.0 (default: 0.3) |
Models
| Model ID | Description |
|---|
Hamsa-General-V2.0 | General-purpose transcription — best for media, podcasts, and pre-recorded content |
Hamsa-Conversational-V1.0 | Optimized for conversational audio — best for meetings, calls, and dialogues |
Streaming via WebSocket
For real-time streaming transcription, use the WebSocket API. See the WebSocket STT documentation for details.