Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tryhamsa.com/llms.txt

Use this file to discover all available pages before exploring further.

Speech to Text Quickstart

Hamsa offers two STT endpoints:
  • Batch API (/v1/jobs/transcribe) — Async job-based. Accepts a media URL and delivers the transcription via webhook or polling.
  • Realtime API (/v1/realtime/stt) — Synchronous. Accepts base64-encoded audio and returns the transcription directly.

Prerequisites

Batch Transcription

Submit a media URL for transcription. Results are delivered asynchronously.
curl -X POST https://api.tryhamsa.com/v1/jobs/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "mediaUrl": "https://your-storage.com/audio.mp3",
    "model": "Hamsa-General-V2.0",
    "language": "ar",
    "webhookUrl": "https://your-server.com/webhook"
  }'

Realtime Transcription (synchronous)

Send base64-encoded audio and receive the transcription directly.
curl -X POST https://api.tryhamsa.com/v1/realtime/stt \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audioBase64": "<base64-encoded-wav-audio>",
    "language": "ar"
  }'

Parameters

Batch API

ParameterTypeRequiredDescription
mediaUrlstring (URI)YesURL of the audio/video file to transcribe
modelstringYesModel to use: Hamsa-General-V2.0 or Hamsa-Conversational-V1.0
languagestringNoLanguage code: ar (default) or en
webhookUrlstring (URI)NoURL to receive the completed transcription
webhookAuthobjectNoAuthentication for the webhook
titlestringNoOptional title for the transcription job
processingTypestringNoProcessing type (default: async)
returnSrtFormatbooleanNoReturn SRT subtitle format (default: false)
srtOptionsobjectNoSRT formatting options (see below)

SRT Options

When returnSrtFormat is true, you can customize the subtitle formatting:
ParameterTypeDefaultDescription
maxLinesPerSubtitleinteger2Maximum lines per subtitle block
singleSpeakerPerSubtitlebooleantrueKeep one speaker per subtitle
maxCharsPerLineinteger42Maximum characters per line
maxMergeableGapnumber0.3Max gap (seconds) to merge segments
minDurationnumber0.7Minimum subtitle duration (seconds)
maxDurationnumber7Maximum subtitle duration (seconds)
minGapnumber0.04Minimum gap between subtitles (seconds)

Realtime API

ParameterTypeRequiredDescription
audioBase64stringYesBase64-encoded audio data (WAV format)
languagestringNoLanguage code: ar (default) or en
isEosEnabledbooleanNoEnable end-of-speech detection (default: false)
eosThresholdnumberNoEnd-of-speech detection threshold, 0.0–1.0 (default: 0.3)

Models

Model IDDescription
Hamsa-General-V2.0General-purpose transcription — best for media, podcasts, and pre-recorded content
Hamsa-Conversational-V1.0Optimized for conversational audio — best for meetings, calls, and dialogues

Streaming via WebSocket

For real-time streaming transcription, use the WebSocket API. See the WebSocket STT documentation for details.