Transcribes (or translates) audio/video to text.
Pass the file URL via input.audio_url.
Optional parameters:
language — BCP-47 code (e.g. "en", "zh"). Omit for auto-detect.task — "transcribe" (default) or "translate" (output in English)timestamps — "word" or "segment"Workflow
GET /hub/v1/models?capability=transcribe — browse available modelsPOST /hub/v1/transcribe ← you are hereGET /hub/v1/tasks/:task_id — poll until ready=trueDocumentation Index
Fetch the complete documentation index at: https://docs.mountsea.ai/llms.txt
Use this file to discover all available pages before exploring further.
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Model ID for transcription / translation.
See GET /hub/v1/models?capability=transcribe for the full list.
"whisper-v3"
Transcription input parameters.
audio_url (required) — URL of the audio or video filelanguage — BCP-47 language code (e.g. "en", "zh"). Omit for auto-detect.task — "transcribe" (default) or "translate" (translate to English)timestamps — "word" or "segment" for timestamped output{
"audio_url": "https://example.com/audio.mp3",
"language": "en"
}Unique task ID — use this to poll GET /hub/v1/tasks/:task_id
"hub-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
Task status at creation time (usually pending)
"pending"
Capability: image | video | audio | transcribe
"video"
Model ID
"veo-3.1-fast"
Model vendor
"Google"
Generation mode (e.g. text-to-video, image-to-image)
"text-to-video"
ISO 8601 creation timestamp
"2026-05-18T09:00:00.000Z"