> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mountsea.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Submit Audio Task

> Generates audio using the selected model.

**Music generation**: `elevenlabs-music` — creates music from a text description.
Key parameter: `music_length_ms` (milliseconds, e.g. `30000` = 30 s). Billed per output minute.

---

**Workflow**
1. `GET /hub/v1/models?capability=audio` — browse available models
2. `GET /hub/v1/models/:model` — copy the `example` as your `input`
3. `POST /hub/v1/audio` ← you are here
4. `GET /hub/v1/tasks/:task_id` — poll until `ready=true`

---

## Model Reference

> **Tip:** Click **Try it out** → select a model from the dropdown below → the parameter schema auto-populates with an example.

### `elevenlabs-music` — ElevenLabs Music (ElevenLabs) · `music-generate`

ElevenLabs Music: AI music generation from text description.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `prompt` | `string` |  | – | – | The text prompt describing the music to generate. Use this for simple text-to-music generation. Mutually exclusive with composition_plan. |
| `output_format` | `string` |  | `mp3_44100_128` | `mp3_22050_32` `mp3_44100_32` `mp3_44100_64` `mp3_44100_96` `mp3_44100_128` `mp3_44100_192` `pcm_8000` `pcm_16000` `pcm_22050` `pcm_24000` `pcm_44100` `pcm_48000` `ulaw_8000` `alaw_8000` `opus_48000_32` `opus_48000_64` `opus_48000_96` `opus_48000_128` `opus_48000_192` | Output audio format. Encoded as codec_sampleRate_bitrate (e.g. mp3_44100_128 = MP3 at 44.1kHz / 128kbps). Note: mp3_44100_192 requires Creator tier; pcm_44100 requires Pro tier. |
| `music_length_ms` | `integer` | ✓ | – | `3000`–`600000` | Duration of the generated music in milliseconds. Required for billing. Range: 3000ms (3s) to 600000ms (10min). Use with prompt only; when using composition_plan, total duration is determined by the sum of section duration_ms values. |
| `composition_plan` | `object` |  | – | – | Advanced: structured composition plan with sections, styles and lyrics. Each section requires section_name, positive_local_styles[], negative_local_styles[], duration_ms (3000-120000ms), and lines[]. Also requires positive_global_styles[] and negative_global_styles[] at the top level. Mutually exclusive with prompt. |
| `force_instrumental` | `boolean` |  | – | – | If true, guarantees the generated song is instrumental (no vocals). Can only be used with prompt. |
| `respect_sections_durations` | `boolean` |  | `true` | – | Controls how strictly section durations in the composition_plan are enforced. Only effective with composition_plan. When true, each section's duration_ms is precisely respected; when false, the model may adjust durations for better quality while preserving total song length. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "elevenlabs-music",
  "input": {
    "prompt": "Mysterious original soundtrack, themes of jungle, rainforest, nature, woodwinds, busy rhythmic tribal percussion.",
    "output_format": "mp3_44100_128",
    "music_length_ms": 60000
  }
}
```
</details>

---


## OpenAPI

````yaml POST /hub/v1/audio
openapi: 3.0.0
info:
  title: Hub - Premium AI Gateway (Image / Video / Audio / Transcribe)
  description: >-
    Hub is a unified AI Gateway offering **flagship-quality, production-stable,
    and cheaper-than-official** access to the world's best AI models across
    image, video, audio (music) and transcription capabilities.


    **Why Hub?**

    - 🏆 **Flagship quality** — only official flagship model weights (Veo 3.1,
    Nano Banana Pro, GPT Image 2, Kling v3 Pro, WAN 2.7, Seedance 2.0,
    ElevenLabs Music). No knock-offs, no distillations — same outputs as going
    direct.

    - 🛡️ **Production stable** — multi-region routing, automatic failover,
    transparent retries on transient errors, queue-aware load balancing. Built
    for 24/7 production traffic.

    - 💰 **Cheaper than official** — pay only on `status=completed` (failed
    tasks are free), billed in unified credits at a meaningful discount versus
    going direct to the model provider. No per-provider minimums, no monthly
    subscriptions.


    Each endpoint accepts a `model` + `input` payload — switch models without
    changing the endpoint shape.
  version: 1.0.0
  contact: {}
servers:
  - url: https://api.mountsea.ai
    description: API Gateway
security: []
tags:
  - name: hub
    description: >-
      Model discovery — list and inspect schemas/examples for every available
      model across all capabilities.
  - name: Image
    description: >-
      Image generation & editing models (Nano Banana, GPT Image 2 and edit
      variants).
  - name: Video
    description: >-
      Video generation models — text-to-video, image-to-video, multi-reference,
      first-last frame and edit (Veo 3.1, Kling v3, WAN 2.7, Seedance 2.0).
  - name: Audio
    description: >-
      Audio capabilities — music generation (ElevenLabs Music) and audio/video
      transcription.
  - name: Tasks
    description: Poll the status / result of any submitted Hub task.
paths:
  /hub/v1/audio:
    post:
      tags:
        - Audio
      summary: Submit an audio generation task
      description: >
        Generates audio using the selected model.


        **Music generation**: `elevenlabs-music` — creates music from a text
        description.

        Key parameter: `music_length_ms` (milliseconds, e.g. `30000` = 30 s).
        Billed per output minute.


        ---


        **Workflow**

        1. `GET /hub/v1/models?capability=audio` — browse available models

        2. `GET /hub/v1/models/:model` — copy the `example` as your `input`

        3. `POST /hub/v1/audio` ← you are here

        4. `GET /hub/v1/tasks/:task_id` — poll until `ready=true`


        ---


        ## Model Reference


        > **Tip:** Click **Try it out** → select a model from the dropdown below
        → the parameter schema auto-populates with an example.


        ### `elevenlabs-music` — ElevenLabs Music (ElevenLabs) ·
        `music-generate`


        ElevenLabs Music: AI music generation from text description.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `prompt` | `string` |  | – | – | The text prompt describing the music
        to generate. Use this for simple text-to-music generation. Mutually
        exclusive with composition_plan. |

        | `output_format` | `string` |  | `mp3_44100_128` | `mp3_22050_32`
        `mp3_44100_32` `mp3_44100_64` `mp3_44100_96` `mp3_44100_128`
        `mp3_44100_192` `pcm_8000` `pcm_16000` `pcm_22050` `pcm_24000`
        `pcm_44100` `pcm_48000` `ulaw_8000` `alaw_8000` `opus_48000_32`
        `opus_48000_64` `opus_48000_96` `opus_48000_128` `opus_48000_192` |
        Output audio format. Encoded as codec_sampleRate_bitrate (e.g.
        mp3_44100_128 = MP3 at 44.1kHz / 128kbps). Note: mp3_44100_192 requires
        Creator tier; pcm_44100 requires Pro tier. |

        | `music_length_ms` | `integer` | ✓ | – | `3000`–`600000` | Duration of
        the generated music in milliseconds. Required for billing. Range: 3000ms
        (3s) to 600000ms (10min). Use with prompt only; when using
        composition_plan, total duration is determined by the sum of section
        duration_ms values. |

        | `composition_plan` | `object` |  | – | – | Advanced: structured
        composition plan with sections, styles and lyrics. Each section requires
        section_name, positive_local_styles[], negative_local_styles[],
        duration_ms (3000-120000ms), and lines[]. Also requires
        positive_global_styles[] and negative_global_styles[] at the top level.
        Mutually exclusive with prompt. |

        | `force_instrumental` | `boolean` |  | – | – | If true, guarantees the
        generated song is instrumental (no vocals). Can only be used with
        prompt. |

        | `respect_sections_durations` | `boolean` |  | `true` | – | Controls
        how strictly section durations in the composition_plan are enforced.
        Only effective with composition_plan. When true, each section's
        duration_ms is precisely respected; when false, the model may adjust
        durations for better quality while preserving total song length. |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "elevenlabs-music",
          "input": {
            "prompt": "Mysterious original soundtrack, themes of jungle, rainforest, nature, woodwinds, busy rhythmic tribal percussion.",
            "output_format": "mp3_44100_128",
            "music_length_ms": 60000
          }
        }

        ```

        </details>


        ---
      operationId: HubAudioPublicController_submitAudio
      parameters: []
      requestBody:
        required: true
        content:
          application/json:
            schema:
              oneOf:
                - $ref: '#/components/schemas/ElevenlabsMusicRequest'
              discriminator:
                propertyName: model
                mapping:
                  elevenlabs-music:
                    $ref: '#/components/schemas/ElevenlabsMusicRequest'
      responses:
        '200':
          description: ''
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/SubmitResponseDto'
      security:
        - bearerAuth: []
components:
  schemas:
    ElevenlabsMusicRequest:
      type: object
      title: ElevenLabs Music (ElevenLabs) [music-generate]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - elevenlabs-music
          description: 'Fixed value: `"elevenlabs-music"`'
        input:
          $ref: '#/components/schemas/ElevenlabsMusicInput'
      example:
        model: elevenlabs-music
        input:
          prompt: >-
            Mysterious original soundtrack, themes of jungle, rainforest,
            nature, woodwinds, busy rhythmic tribal percussion.
          output_format: mp3_44100_128
          music_length_ms: 60000
    SubmitResponseDto:
      type: object
      properties:
        task_id:
          type: string
          description: Unique task ID — use this to poll GET /hub/v1/tasks/:task_id
          example: hub-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
        status:
          type: string
          description: Task status at creation time (usually `pending`)
          example: pending
        capability:
          type: string
          description: 'Capability: `image` | `video` | `audio` | `transcribe`'
          example: video
        model:
          type: string
          description: Model ID
          example: veo-3.1-fast
        vendor:
          type: string
          description: Model vendor
          example: Google
        mode:
          type: string
          description: Generation mode (e.g. `text-to-video`, `image-to-image`)
          example: text-to-video
        created_at:
          type: string
          description: ISO 8601 creation timestamp
          example: '2026-05-18T09:00:00.000Z'
      required:
        - task_id
        - status
        - capability
        - model
        - vendor
        - mode
        - created_at
    ElevenlabsMusicInput:
      type: object
      title: ElevenLabs Music — input
      description: 'ElevenLabs Music: AI music generation from text description.'
      properties:
        prompt:
          type: string
          description: >-
            The text prompt describing the music to generate. Use this for
            simple text-to-music generation. Mutually exclusive with
            composition_plan.
        output_format:
          type: string
          description: >-
            Output audio format. Encoded as codec_sampleRate_bitrate (e.g.
            mp3_44100_128 = MP3 at 44.1kHz / 128kbps). Note: mp3_44100_192
            requires Creator tier; pcm_44100 requires Pro tier.
          enum:
            - mp3_22050_32
            - mp3_44100_32
            - mp3_44100_64
            - mp3_44100_96
            - mp3_44100_128
            - mp3_44100_192
            - pcm_8000
            - pcm_16000
            - pcm_22050
            - pcm_24000
            - pcm_44100
            - pcm_48000
            - ulaw_8000
            - alaw_8000
            - opus_48000_32
            - opus_48000_64
            - opus_48000_96
            - opus_48000_128
            - opus_48000_192
          default: mp3_44100_128
        music_length_ms:
          type: integer
          description: >-
            Duration of the generated music in milliseconds. Required for
            billing. Range: 3000ms (3s) to 600000ms (10min). Use with prompt
            only; when using composition_plan, total duration is determined by
            the sum of section duration_ms values.
          minimum: 3000
          maximum: 600000
        composition_plan:
          type: object
          description: >-
            Advanced: structured composition plan with sections, styles and
            lyrics. Each section requires section_name, positive_local_styles[],
            negative_local_styles[], duration_ms (3000-120000ms), and lines[].
            Also requires positive_global_styles[] and negative_global_styles[]
            at the top level. Mutually exclusive with prompt.
        force_instrumental:
          type: boolean
          description: >-
            If true, guarantees the generated song is instrumental (no vocals).
            Can only be used with prompt.
        respect_sections_durations:
          type: boolean
          description: >-
            Controls how strictly section durations in the composition_plan are
            enforced. Only effective with composition_plan. When true, each
            section's duration_ms is precisely respected; when false, the model
            may adjust durations for better quality while preserving total song
            length.
          default: true
      required:
        - music_length_ms
  securitySchemes:
    bearerAuth:
      scheme: bearer
      bearerFormat: JWT
      type: http

````