Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tryhamsa.com/llms.txt

Use this file to discover all available pages before exploring further.

Hamsa provides two options for real-time speech-to-text:
  • Realtime API (POST /v1/realtime/stt) — Send base64-encoded audio and receive the transcription directly in the response. Best for short audio clips.
  • WebSocket (wss://api.tryhamsa.com/v1/realtime/ws) — Persistent bidirectional connection for streaming audio. Best for live conversations and continuous transcription.

Realtime API

The Realtime API accepts base64-encoded audio and returns the transcription synchronously. See the Quickstart for usage examples.

Parameters

ParameterTypeRequiredDescription
audioBase64stringYesBase64-encoded audio data (WAV format)
languagestringNoLanguage code: ar (default) or en
isEosEnabledbooleanNoEnable end-of-speech detection (default: false)
eosThresholdnumberNoEnd-of-speech detection threshold, 0.0-1.0 (default: 0.3)

WebSocket Streaming

The WebSocket API provides a persistent connection for streaming audio in real time. Send audio chunks as you record and receive transcription results as they become available.

Endpoint

wss://api.tryhamsa.com/v1/realtime/ws

Authentication

Authenticate via query parameter or header:
wss://api.tryhamsa.com/v1/realtime/ws?api_key=YOUR_API_KEY

Request Format

Send a JSON message with type: "stt":
{
  "type": "stt",
  "payload": {
    "audioBase64": "//NExAAAAAANIAcAPABEAEQAQABEAEQARABEA...",
    "language": "ar",
    "isEosEnabled": true,
    "eosThreshold": 0.3
  }
}

Response

The server sends the transcribed text directly as a plain string (not JSON). For full WebSocket documentation including connection handling, error codes, and code examples, see the WebSocket API reference.