跳转到主要内容

接口地址

POST https://api.mountsea.ai/hub/v1/video
Authorization: Bearer <your-api-key>
Content-Type: application/json
请求结构
{
  "model": "<model-id>",   // 必填 — 从下方模型参考中选取
  "input": { ... }         // 模型专属参数 — 展开对应模型查看详情
}
响应
{ "taskId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" }
轮询 GET /hub/v1/tasks/:taskId,直到 ready = true

模型参考(Model Reference)

点击下方能力分类标签,再点击模型名称即可展开参数表与可直接复制的请求示例。
生成模式(文生视频 / 图生视频 / 参考图 / 首尾帧 / 视频编辑)完全由你选择的 model 决定。先选类别,展开模型,复制示例即可。
Google Veo 3.1 text-to-video with optional native audio.
ParameterTypeReqDefaultValues / RangeDescription
seedintegerThe seed for the random number generator.
promptstringThe text prompt describing the video you want to generate
auto_fixbooleantrueWhether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them.
durationstring8s4s 6s 8sThe duration of the generated video.
resolutionstring720p720p 1080p 4kThe resolution of the generated video.
aspect_ratiostring16:916:9 9:16Aspect ratio of the generated video
generate_audiobooleantrueWhether to generate audio for the video.
negative_promptstringA negative prompt to guide the video generation.
safety_tolerancestring41 2 3 4 5 6The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter.
Veo 3.1 Fast: lower-latency text-to-video at reduced cost.
ParameterTypeReqDefaultValues / RangeDescription
seedintegerThe seed for the random number generator.
promptstringThe text prompt describing the video you want to generate
auto_fixbooleantrueWhether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them.
durationstring8s4s 6s 8sThe duration of the generated video.
resolutionstring720p720p 1080p 4kThe resolution of the generated video.
aspect_ratiostring16:916:9 9:16Aspect ratio of the generated video
generate_audiobooleantrueWhether to generate audio for the video.
negative_promptstringA negative prompt to guide the video generation.
safety_tolerancestring41 2 3 4 5 6The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter.
Veo 3.1 Lite: lowest cost text-to-video (720p/1080p only).
ParameterTypeReqDefaultValues / RangeDescription
seedintegerThe seed for the random number generator.
promptstringThe text prompt describing the video you want to generate
auto_fixbooleantrueWhether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them.
durationstring8s4s 6s 8sThe duration of the generated video.
resolutionstring720p720p 1080pThe resolution of the generated video.
aspect_ratiostring16:916:9 9:16Aspect ratio of the generated video
generate_audiobooleantrueWhether to generate audio for the video.
negative_promptstringA negative prompt to guide the video generation.
safety_tolerancestring41 2 3 4 5 6The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter.
Google Veo 3 text-to-video with native audio.
ParameterTypeReqDefaultValues / RangeDescription
seedintegerThe seed for the random number generator.
promptstringThe text prompt describing the video you want to generate
auto_fixbooleantrueWhether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them.
durationstring8s4s 6s 8sThe duration of the generated video.
resolutionstring720p720p 1080pThe resolution of the generated video.
aspect_ratiostring16:916:9 9:16The aspect ratio of the generated video.
generate_audiobooleantrueWhether to generate audio for the video.
negative_promptstringA negative prompt to guide the video generation.
safety_tolerancestring41 2 3 4 5 6The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter.
Kling v3 Standard text-to-video with optional native audio.
ParameterTypeReqDefaultValues / RangeDescription
promptstringText prompt for video generation. Either prompt or multi_prompt must be provided, but not both.
durationstring53 4 5 6 7 8 9 10 11 12 13 14 15The duration of the generated video in seconds
cfg_scalenumber0.5The CFG (Classifier Free Guidance) scale is a measure of how close you want the model to stick to your prompt.
shot_typestringcustomizecustomize intelligentThe type of multi-shot video generation. ‘intelligent’ lets the model automatically determine shot structure.
aspect_ratiostring16:916:9 9:16 1:1The aspect ratio of the generated video frame
multi_promptarrayList of prompts for multi-shot video generation. If provided, overrides the single prompt and divides the video into multiple shots with specified prompts and durations.
generate_audiobooleantrueWhether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase.
negative_promptstringblur, distort, and low quality
Kling v3 Pro text-to-video with optional native audio.
ParameterTypeReqDefaultValues / RangeDescription
promptstringText prompt for video generation. Either prompt or multi_prompt must be provided, but not both.
durationstring53 4 5 6 7 8 9 10 11 12 13 14 15The duration of the generated video in seconds
cfg_scalenumber0.5The CFG (Classifier Free Guidance) scale is a measure of how close you want the model to stick to your prompt.
shot_typestringcustomizecustomize intelligentThe type of multi-shot video generation. ‘intelligent’ lets the model automatically determine shot structure.
aspect_ratiostring16:916:9 9:16 1:1The aspect ratio of the generated video frame
multi_promptarrayList of prompts for multi-shot video generation. If provided, overrides the single prompt and divides the video into multiple shots with specified prompts and durations.
generate_audiobooleantrueWhether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase.
negative_promptstringblur, distort, and low quality
WAN 2.7 text-to-video - high quality generation. Default resolution is 1080p.
ParameterTypeReqDefaultValues / RangeDescription
seedintegerRandom seed for reproducibility (0-2147483647).
promptstringText prompt describing the desired video. Max 5000 characters.
durationinteger52 3 4 5 6 7 8 9 10 11 12 13 14 15Output video duration in seconds (2-15).
audio_urlstringURL of driving audio. Supports WAV and MP3. Duration: 3-30s. Max 15 MB. If not provided, the model auto-generates matching background music.
resolutionstring1080p720p 1080pOutput video resolution tier.
aspect_ratiostring16:916:9 9:16 1:1 4:3 3:4Aspect ratio of the generated video.
negative_promptstringContent to avoid in the video. Max 500 characters.
enable_safety_checkerbooleantrueEnable content moderation for input and output.
enable_prompt_expansionbooleantrueEnable intelligent prompt rewriting.
ByteDance Seedance 2.0: cinematic text-to-video with native audio, physics, and camera control.
ParameterTypeReqDefaultValues / RangeDescription
seedintegerRandom seed for reproducibility. Note that results may still vary slightly even with the same seed.
promptstringThe text prompt used to generate the video
durationstring44 5 6 7 8 9 10 11 12 13 14 15Duration of the video in seconds (4-15).
resolutionstring720p480p 720p 1080pVideo resolution - 480p for faster generation, 720p for balance, 1080p for highest quality.
end_user_idstringThe unique user ID of the end user.
aspect_ratiostringautoauto 21:9 16:9 4:3 1:1 3:4 9:16The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide.
generate_audiobooleantrueWhether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not.
ByteDance Seedance 2.0 fast tier: lower-latency text-to-video with native audio.
ParameterTypeReqDefaultValues / RangeDescription
seedintegerRandom seed for reproducibility. Note that results may still vary slightly even with the same seed.
promptstringThe text prompt used to generate the video
durationstring44 5 6 7 8 9 10 11 12 13 14 15Duration of the video in seconds (4-15).
resolutionstring720p480p 720pVideo resolution - 480p for faster generation, 720p for balance.
end_user_idstringThe unique user ID of the end user.
aspect_ratiostringautoauto 21:9 16:9 4:3 1:1 3:4 9:16The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide.
generate_audiobooleantrueWhether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not.