跳转到主要内容

接口地址

POST https://api.mountsea.ai/hub/v1/audio
Authorization: Bearer <your-api-key>
Content-Type: application/json
请求结构
{
  "model": "<model-id>",   // 必填 — 从下方模型参考中选取
  "input": { ... }         // 模型专属参数 — 展开对应模型查看详情
}
响应
{ "taskId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" }
轮询 GET /hub/v1/tasks/:taskId,直到 ready = true

模型参考(Model Reference)

点击下方能力分类标签,再点击模型名称即可展开参数表与可直接复制的请求示例。
基于 ElevenLabs 的音乐生成。展开模型即可查看参数表与请求示例。
ElevenLabs Music: AI music generation from text description.
ParameterTypeReqDefaultValues / RangeDescription
promptstringThe text prompt describing the music to generate. Use this for simple text-to-music generation. Mutually exclusive with composition_plan.
output_formatstringmp3_44100_128mp3_22050_32 mp3_44100_32 mp3_44100_64 mp3_44100_96 mp3_44100_128 mp3_44100_192 pcm_8000 pcm_16000 pcm_22050 pcm_24000 pcm_44100 pcm_48000 ulaw_8000 alaw_8000 opus_48000_32 opus_48000_64 opus_48000_96 opus_48000_128 opus_48000_192Output audio format. Encoded as codec_sampleRate_bitrate (e.g. mp3_44100_128 = MP3 at 44.1kHz / 128kbps). Note: mp3_44100_192 requires Creator tier; pcm_44100 requires Pro tier.
music_length_msinteger3000600000Duration of the generated music in milliseconds. Required for billing. Range: 3000ms (3s) to 600000ms (10min). Use with prompt only; when using composition_plan, total duration is determined by the sum of section duration_ms values.
composition_planobjectAdvanced: structured composition plan with sections, styles and lyrics. Each section requires section_name, positive_local_styles[], negative_local_styles[], duration_ms (3000-120000ms), and lines[]. Also requires positive_global_styles[] and negative_global_styles[] at the top level. Mutually exclusive with prompt.
force_instrumentalbooleanIf true, guarantees the generated song is instrumental (no vocals). Can only be used with prompt.
respect_sections_durationsbooleantrueControls how strictly section durations in the composition_plan are enforced. Only effective with composition_plan. When true, each section’s duration_ms is precisely respected; when false, the model may adjust durations for better quality while preserving total song length.