Skip to main content
Hamsa Speech to Text (STT) accurately transcribes Arabic speech across multiple dialects into text with word-level timestamps and speaker identification.

What you can do

  • Transcribe Arabic media content, podcasts, and videos
  • Generate subtitles for Arabic video content
  • Create searchable text from Arabic audio recordings
  • Enable real-time transcription for voice agents and live calls
  • Document Arabic meetings and interviews

Models

ModelBest ForLatency
STT StandardBatch transcription, high accuracyOptimized for quality
STT RealtimeLive calls, voice agents, streaming~150-250ms

View all models

Compare models and see detailed specifications

Key features

  • Dialect recognition: Automatic detection and transcription of Arabic dialects
  • Word-level timestamps: Precise timing for each transcribed word; the transcription API returns word-level data (word text plus start/end times) in each transcript segment so you can build word highlight or karaoke-style experiences
  • Word highlight during playback: In the Media Platform, the word currently being spoken is highlighted in sync with the audio or video; you can also click a word to jump to that point in the media
  • Speaker diarization: Identify different speakers in multi-speaker audio
  • Code-switching: Handle mixed Arabic-English speech naturally

Word highlight during playback

The transcription API returns word-level data in each transcript segment: each word includes its text plus start and end timestamps (in seconds). This enables two things:
  • In the Media Platform: When you open a transcription and play the audio or video, the word currently being spoken is highlighted in sync with playback. You can also click any word in the transcript to seek the media to that position.
  • Via the API: Your application receives the same word-level timestamps in the transcript/segment response, so you can build karaoke-style highlighting, click-to-seek, or other experiences that follow the speech.

Supported languages

  • Arabic dialects: Egyptian, Gulf, Levantine, North African, Iraqi, Yemeni, Modern Standard Arabic
  • English: US English

Get started

STT Documentation

Complete guide to Speech to Text features and integration

Quickstart

Get started with STT in minutes

Media Platform

Use STT through the web interface

API Reference

Technical API documentation