Speech to Text Quickstart

Hamsa offers two STT endpoints:

Batch API (/v1/jobs/transcribe) — Async job-based. Accepts a media URL and delivers the transcription via webhook or polling.
Realtime API (/v1/realtime/stt) — Synchronous. Accepts base64-encoded audio and returns the transcription directly.

Prerequisites

Hamsa API Key

Batch Transcription

Submit a media URL for transcription. Results are delivered asynchronously.

curl -X POST https://api.tryhamsa.com/v1/jobs/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "mediaUrl": "https://your-storage.com/audio.mp3",
    "model": "Hamsa-General-V2.0",
    "language": "ar",
    "webhookUrl": "https://your-server.com/webhook"
  }'

Realtime Transcription (synchronous)

Send base64-encoded audio and receive the transcription directly.

curl -X POST https://api.tryhamsa.com/v1/realtime/stt \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audioBase64": "<base64-encoded-wav-audio>",
    "language": "ar"
  }'

Parameters

Batch API

Parameter	Type	Required	Description
`mediaUrl`	string (URI)	Yes	URL of the audio/video file to transcribe
`model`	string	Yes	Model to use: `Hamsa-General-V2.0` or `Hamsa-Conversational-V1.0`
`language`	string	No	Language code: `ar` (default) or `en`
`webhookUrl`	string (URI)	No	URL to receive the completed transcription
`webhookAuth`	object	No	Authentication for the webhook
`title`	string	No	Optional title for the transcription job
`processingType`	string	No	Processing type (default: `async`)
`returnSrtFormat`	boolean	No	Return SRT subtitle format (default: `false`)
`srtOptions`	object	No	SRT formatting options (see below)

SRT Options

When returnSrtFormat is true, you can customize the subtitle formatting:

Parameter	Type	Default	Description
`maxLinesPerSubtitle`	integer	2	Maximum lines per subtitle block
`singleSpeakerPerSubtitle`	boolean	true	Keep one speaker per subtitle
`maxCharsPerLine`	integer	42	Maximum characters per line
`maxMergeableGap`	number	0.3	Max gap (seconds) to merge segments
`minDuration`	number	0.7	Minimum subtitle duration (seconds)
`maxDuration`	number	7	Maximum subtitle duration (seconds)
`minGap`	number	0.04	Minimum gap between subtitles (seconds)

Realtime API

Parameter	Type	Required	Description
`audioBase64`	string	Yes	Base64-encoded audio data (WAV format)
`language`	string	No	Language code: `ar` (default) or `en`
`isEosEnabled`	boolean	No	Enable end-of-speech detection (default: `false`)
`eosThreshold`	number	No	End-of-speech detection threshold, 0.0–1.0 (default: `0.3`)

Models

Model ID	Description
`Hamsa-General-V2.0`	General-purpose transcription — best for media, podcasts, and pre-recorded content
`Hamsa-Conversational-V1.0`	Optimized for conversational audio — best for meetings, calls, and dialogues

Streaming via WebSocket

For real-time streaming transcription, use the WebSocket API. See the WebSocket STT documentation for details.

Documentation Index

​Speech to Text Quickstart

​Prerequisites

​Batch Transcription

​Realtime Transcription (synchronous)

​Parameters

​Batch API

​SRT Options

​Realtime API

​Models

​Streaming via WebSocket