Real-time Transcription

Hamsa provides two options for real-time speech-to-text:

Realtime API (POST /v1/realtime/stt) — Send base64-encoded audio and receive the transcription directly in the response. Best for short audio clips.
WebSocket (wss://api.tryhamsa.com/v1/realtime/ws) — Persistent bidirectional connection for streaming audio. Best for live conversations and continuous transcription.

Realtime API

The Realtime API accepts base64-encoded audio and returns the transcription synchronously. See the Quickstart for usage examples.

Parameters

Parameter	Type	Required	Description
`audioBase64`	string	Yes	Base64-encoded audio data (WAV format)
`language`	string	No	Language code: `ar` (default) or `en`
`isEosEnabled`	boolean	No	Enable end-of-speech detection (default: `false`)
`eosThreshold`	number	No	End-of-speech detection threshold, 0.0-1.0 (default: `0.3`)

WebSocket Streaming

The WebSocket API provides a persistent connection for streaming audio in real time. Send audio chunks as you record and receive transcription results as they become available.

Endpoint

wss://api.tryhamsa.com/v1/realtime/ws

Authentication

Authenticate via query parameter or header:

wss://api.tryhamsa.com/v1/realtime/ws?api_key=YOUR_API_KEY

Request Format

Send a JSON message with type: "stt":

{
  "type": "stt",
  "payload": {
    "audioBase64": "//NExAAAAAANIAcAPABEAEQAQABEAEQARABEA...",
    "language": "ar",
    "isEosEnabled": true,
    "eosThreshold": 0.3
  }
}

Response

The server sends the transcribed text directly as a plain string (not JSON). For full WebSocket documentation including connection handling, error codes, and code examples, see the WebSocket API reference.

Documentation Index

​Realtime API

​Parameters

​WebSocket Streaming

​Endpoint

​Authentication

​Request Format

​Response