> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mountsea.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# 提交转写任务

> Transcribes (or translates) audio/video to text.

Pass the file URL via `input.audio_url`.

**Optional parameters**:
- `language` — BCP-47 code (e.g. `"en"`, `"zh"`). Omit for auto-detect.
- `task` — `"transcribe"` (default) or `"translate"` (output in English)
- `timestamps` — `"word"` or `"segment"`

---

**Workflow**
1. `GET /hub/v1/models?capability=transcribe` — browse available models
2. `POST /hub/v1/transcribe` ← you are here
3. `GET /hub/v1/tasks/:task_id` — poll until `ready=true`


## OpenAPI

````yaml POST /hub/v1/transcribe
openapi: 3.0.0
info:
  title: Hub - Premium AI Gateway (Image / Video / Audio / Transcribe)
  description: >-
    Hub is a unified AI Gateway offering **flagship-quality, production-stable,
    and cheaper-than-official** access to the world's best AI models across
    image, video, audio (music) and transcription capabilities.


    **Why Hub?**

    - 🏆 **Flagship quality** — only official flagship model weights (Veo 3.1,
    Nano Banana Pro, GPT Image 2, Kling v3 Pro, WAN 2.7, Seedance 2.0,
    ElevenLabs Music). No knock-offs, no distillations — same outputs as going
    direct.

    - 🛡️ **Production stable** — multi-region routing, automatic failover,
    transparent retries on transient errors, queue-aware load balancing. Built
    for 24/7 production traffic.

    - 💰 **Cheaper than official** — pay only on `status=completed` (failed
    tasks are free), billed in unified credits at a meaningful discount versus
    going direct to the model provider. No per-provider minimums, no monthly
    subscriptions.


    Each endpoint accepts a `model` + `input` payload — switch models without
    changing the endpoint shape.
  version: 1.0.0
  contact: {}
servers:
  - url: https://api.mountsea.ai
    description: API Gateway
security: []
tags:
  - name: hub
    description: >-
      Model discovery — list and inspect schemas/examples for every available
      model across all capabilities.
  - name: Image
    description: >-
      Image generation & editing models (Nano Banana, GPT Image 2 and edit
      variants).
  - name: Video
    description: >-
      Video generation models — text-to-video, image-to-video, multi-reference,
      first-last frame and edit (Veo 3.1, Kling v3, WAN 2.7, Seedance 2.0).
  - name: Audio
    description: >-
      Audio capabilities — music generation (ElevenLabs Music) and audio/video
      transcription.
  - name: Tasks
    description: Poll the status / result of any submitted Hub task.
paths:
  /hub/v1/transcribe:
    post:
      tags:
        - Audio
      summary: Submit a transcription task
      description: |-
        Transcribes (or translates) audio/video to text.

        Pass the file URL via `input.audio_url`.

        **Optional parameters**:
        - `language` — BCP-47 code (e.g. `"en"`, `"zh"`). Omit for auto-detect.
        - `task` — `"transcribe"` (default) or `"translate"` (output in English)
        - `timestamps` — `"word"` or `"segment"`

        ---

        **Workflow**
        1. `GET /hub/v1/models?capability=transcribe` — browse available models
        2. `POST /hub/v1/transcribe` ← you are here
        3. `GET /hub/v1/tasks/:task_id` — poll until `ready=true`
      operationId: HubAudioPublicController_submitTranscribe
      parameters: []
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/SubmitTranscribeDto'
      responses:
        '200':
          description: ''
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/SubmitResponseDto'
      security:
        - bearerAuth: []
components:
  schemas:
    SubmitTranscribeDto:
      type: object
      properties:
        model:
          type: string
          description: |-
            Model ID for transcription / translation.

            See **GET /hub/v1/models?capability=transcribe** for the full list.
          example: whisper-v3
        input:
          type: object
          description: >-
            Transcription input parameters.


            - `audio_url` *(required)* — URL of the audio or video file

            - `language` — BCP-47 language code (e.g. `"en"`, `"zh"`). Omit for
            auto-detect.

            - `task` — `"transcribe"` (default) or `"translate"` (translate to
            English)

            - `timestamps` — `"word"` or `"segment"` for timestamped output
          example:
            audio_url: https://example.com/audio.mp3
            language: en
      required:
        - model
        - input
    SubmitResponseDto:
      type: object
      properties:
        task_id:
          type: string
          description: Unique task ID — use this to poll GET /hub/v1/tasks/:task_id
          example: hub-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
        status:
          type: string
          description: Task status at creation time (usually `pending`)
          example: pending
        capability:
          type: string
          description: 'Capability: `image` | `video` | `audio` | `transcribe`'
          example: video
        model:
          type: string
          description: Model ID
          example: veo-3.1-fast
        vendor:
          type: string
          description: Model vendor
          example: Google
        mode:
          type: string
          description: Generation mode (e.g. `text-to-video`, `image-to-image`)
          example: text-to-video
        created_at:
          type: string
          description: ISO 8601 creation timestamp
          example: '2026-05-18T09:00:00.000Z'
      required:
        - task_id
        - status
        - capability
        - model
        - vendor
        - mode
        - created_at
  securitySchemes:
    bearerAuth:
      scheme: bearer
      bearerFormat: JWT
      type: http

````