提交视频任务 - MountSea API

接口地址

POST https://api.mountsea.ai/hub/v1/video
Authorization: Bearer <your-api-key>
Content-Type: application/json

请求结构

{
  "model": "<model-id>",   // 必填 — 从下方模型参考中选取
  "input": { ... }         // 模型专属参数 — 展开对应模型查看详情
}

响应

{ "taskId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" }

轮询 GET /hub/v1/tasks/:taskId，直到 ready = true。

模型参考（Model Reference）

点击下方能力分类标签，再点击模型名称即可展开参数表与可直接复制的请求示例。

生成模式（文生视频 / 图生视频 / 参考图 / 首尾帧 / 视频编辑）完全由你选择的 model 决定。先选类别，展开模型，复制示例即可。

文生视频 · 9
图生视频 · 9
参考图生视频 · 5
首尾帧 · 1
视频编辑 · 1

veo-3.1 — Veo 3.1 (Google)

Google Veo 3.1 text-to-video with optional native audio.

Parameter	Type	Req	Default	Values / Range	Description
`seed`	`integer`		–	–	The seed for the random number generator.
`prompt`	`string`	✓	–	–	The text prompt describing the video you want to generate
`auto_fix`	`boolean`		`true`	–	Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them.
`duration`	`string`		`8s`	`4s` `6s` `8s`	The duration of the generated video.
`resolution`	`string`		`720p`	`720p` `1080p` `4k`	The resolution of the generated video.
`aspect_ratio`	`string`		`16:9`	`16:9` `9:16`	Aspect ratio of the generated video
`generate_audio`	`boolean`		`true`	–	Whether to generate audio for the video.
`negative_prompt`	`string`		–	–	A negative prompt to guide the video generation.
`safety_tolerance`	`string`		`4`	`1` `2` `3` `4` `5` `6`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter.

veo-3.1-fast — Veo 3.1 Fast (Google)

Veo 3.1 Fast: lower-latency text-to-video at reduced cost.

Parameter	Type	Req	Default	Values / Range	Description
`seed`	`integer`		–	–	The seed for the random number generator.
`prompt`	`string`	✓	–	–	The text prompt describing the video you want to generate
`auto_fix`	`boolean`		`true`	–	Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them.
`duration`	`string`		`8s`	`4s` `6s` `8s`	The duration of the generated video.
`resolution`	`string`		`720p`	`720p` `1080p` `4k`	The resolution of the generated video.
`aspect_ratio`	`string`		`16:9`	`16:9` `9:16`	Aspect ratio of the generated video
`generate_audio`	`boolean`		`true`	–	Whether to generate audio for the video.
`negative_prompt`	`string`		–	–	A negative prompt to guide the video generation.
`safety_tolerance`	`string`		`4`	`1` `2` `3` `4` `5` `6`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter.

veo-3.1-lite — Veo 3.1 Lite (Google)

Veo 3.1 Lite: lowest cost text-to-video (720p/1080p only).

Parameter	Type	Req	Default	Values / Range	Description
`seed`	`integer`		–	–	The seed for the random number generator.
`prompt`	`string`	✓	–	–	The text prompt describing the video you want to generate
`auto_fix`	`boolean`		`true`	–	Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them.
`duration`	`string`		`8s`	`4s` `6s` `8s`	The duration of the generated video.
`resolution`	`string`		`720p`	`720p` `1080p`	The resolution of the generated video.
`aspect_ratio`	`string`		`16:9`	`16:9` `9:16`	Aspect ratio of the generated video
`generate_audio`	`boolean`		`true`	–	Whether to generate audio for the video.
`negative_prompt`	`string`		–	–	A negative prompt to guide the video generation.
`safety_tolerance`	`string`		`4`	`1` `2` `3` `4` `5` `6`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter.

veo-3 — Veo 3 (Google)

Google Veo 3 text-to-video with native audio.

Parameter	Type	Req	Default	Values / Range	Description
`seed`	`integer`		–	–	The seed for the random number generator.
`prompt`	`string`	✓	–	–	The text prompt describing the video you want to generate
`auto_fix`	`boolean`		`true`	–	Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them.
`duration`	`string`		`8s`	`4s` `6s` `8s`	The duration of the generated video.
`resolution`	`string`		`720p`	`720p` `1080p`	The resolution of the generated video.
`aspect_ratio`	`string`		`16:9`	`16:9` `9:16`	The aspect ratio of the generated video.
`generate_audio`	`boolean`		`true`	–	Whether to generate audio for the video.
`negative_prompt`	`string`		–	–	A negative prompt to guide the video generation.
`safety_tolerance`	`string`		`4`	`1` `2` `3` `4` `5` `6`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter.

kling-v3-standard — Kling v3 Standard (Kuaishou)

Kling v3 Standard text-to-video with optional native audio.

Parameter	Type	Default	Values / Range	Description
`prompt`	`string`	–	–	Text prompt for video generation. Either prompt or multi_prompt must be provided, but not both.
`duration`	`string`	`5`	`3` `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15`	The duration of the generated video in seconds
`cfg_scale`	`number`	`0.5`	–	The CFG (Classifier Free Guidance) scale is a measure of how close you want the model to stick to your prompt.
`shot_type`	`string`	`customize`	`customize` `intelligent`	The type of multi-shot video generation. ‘intelligent’ lets the model automatically determine shot structure.
`aspect_ratio`	`string`	`16:9`	`16:9` `9:16` `1:1`	The aspect ratio of the generated video frame
`multi_prompt`	`array`	–	–	List of prompts for multi-shot video generation. If provided, overrides the single prompt and divides the video into multiple shots with specified prompts and durations.
`generate_audio`	`boolean`	`true`	–	Whether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase.
`negative_prompt`	`string`	`blur, distort, and low quality`	–	–

kling-v3-pro — Kling v3 Pro (Kuaishou)

Kling v3 Pro text-to-video with optional native audio.

Parameter	Type	Default	Values / Range	Description
`prompt`	`string`	–	–	Text prompt for video generation. Either prompt or multi_prompt must be provided, but not both.
`duration`	`string`	`5`	`3` `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15`	The duration of the generated video in seconds
`cfg_scale`	`number`	`0.5`	–	The CFG (Classifier Free Guidance) scale is a measure of how close you want the model to stick to your prompt.
`shot_type`	`string`	`customize`	`customize` `intelligent`	The type of multi-shot video generation. ‘intelligent’ lets the model automatically determine shot structure.
`aspect_ratio`	`string`	`16:9`	`16:9` `9:16` `1:1`	The aspect ratio of the generated video frame
`multi_prompt`	`array`	–	–	List of prompts for multi-shot video generation. If provided, overrides the single prompt and divides the video into multiple shots with specified prompts and durations.
`generate_audio`	`boolean`	`true`	–	Whether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase.
`negative_prompt`	`string`	`blur, distort, and low quality`	–	–

wan-2.7 — WAN 2.7 (Alibaba)

WAN 2.7 text-to-video - high quality generation. Default resolution is 1080p.

Parameter	Type	Req	Default	Values / Range	Description
`seed`	`integer`		–	–	Random seed for reproducibility (0-2147483647).
`prompt`	`string`	✓	–	–	Text prompt describing the desired video. Max 5000 characters.
`duration`	`integer`		`5`	`2` `3` `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15`	Output video duration in seconds (2-15).
`audio_url`	`string`		–	–	URL of driving audio. Supports WAV and MP3. Duration: 3-30s. Max 15 MB. If not provided, the model auto-generates matching background music.
`resolution`	`string`		`1080p`	`720p` `1080p`	Output video resolution tier.
`aspect_ratio`	`string`		`16:9`	`16:9` `9:16` `1:1` `4:3` `3:4`	Aspect ratio of the generated video.
`negative_prompt`	`string`		–	–	Content to avoid in the video. Max 500 characters.
`enable_safety_checker`	`boolean`		`true`	–	Enable content moderation for input and output.
`enable_prompt_expansion`	`boolean`		`true`	–	Enable intelligent prompt rewriting.

seedance-2.0 — Seedance 2.0 (ByteDance)

ByteDance Seedance 2.0: cinematic text-to-video with native audio, physics, and camera control.

Parameter	Type	Req	Default	Values / Range	Description
`seed`	`integer`		–	–	Random seed for reproducibility. Note that results may still vary slightly even with the same seed.
`prompt`	`string`	✓	–	–	The text prompt used to generate the video
`duration`	`string`		`4`	`4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15`	Duration of the video in seconds (4-15).
`resolution`	`string`		`720p`	`480p` `720p` `1080p`	Video resolution - 480p for faster generation, 720p for balance, 1080p for highest quality.
`end_user_id`	`string`		–	–	The unique user ID of the end user.
`aspect_ratio`	`string`		`auto`	`auto` `21:9` `16:9` `4:3` `1:1` `3:4` `9:16`	The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide.
`generate_audio`	`boolean`		`true`	–	Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not.

seedance-2.0-fast — Seedance 2.0 Fast (ByteDance)

ByteDance Seedance 2.0 fast tier: lower-latency text-to-video with native audio.

Parameter	Type	Req	Default	Values / Range	Description
`seed`	`integer`		–	–	Random seed for reproducibility. Note that results may still vary slightly even with the same seed.
`prompt`	`string`	✓	–	–	The text prompt used to generate the video
`duration`	`string`		`4`	`4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15`	Duration of the video in seconds (4-15).
`resolution`	`string`		`720p`	`480p` `720p`	Video resolution - 480p for faster generation, 720p for balance.
`end_user_id`	`string`		–	–	The unique user ID of the end user.
`aspect_ratio`	`string`		`auto`	`auto` `21:9` `16:9` `4:3` `1:1` `3:4` `9:16`	The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide.
`generate_audio`	`boolean`		`true`	–	Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not.

veo-3.1-image — Veo 3.1 Image-to-Video (Google)

Veo 3.1: animate a single reference image.

Parameter	Type	Req	Default	Values / Range	Description
`seed`	`integer`		–	–	The seed for the random number generator.
`prompt`	`string`	✓	–	–	The text prompt describing the video you want to generate
`auto_fix`	`boolean`		–	–	Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them.
`duration`	`string`		`8s`	`4s` `6s` `8s`	The duration of the generated video.
`image_url`	`string`	✓	–	–	URL of the input image to animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be cropped to fit.
`resolution`	`string`		`720p`	`720p` `1080p` `4k`	The resolution of the generated video.
`aspect_ratio`	`string`		`auto`	`auto` `16:9` `9:16`	The aspect ratio of the generated video. Only 16:9 and 9:16 are supported.
`generate_audio`	`boolean`		`true`	–	Whether to generate audio for the video.
`negative_prompt`	`string`		–	–	A negative prompt to guide the video generation.
`safety_tolerance`	`string`		`4`	`1` `2` `3` `4` `5` `6`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter.

veo-3.1-fast-image — Veo 3.1 Fast Image-to-Video (Google)

Veo 3.1 Fast: animate a reference image at lower cost.

Parameter	Type	Req	Default	Values / Range	Description
`seed`	`integer`		–	–	The seed for the random number generator.
`prompt`	`string`	✓	–	–	The text prompt describing the video you want to generate
`auto_fix`	`boolean`		–	–	Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them.
`duration`	`string`		`8s`	`4s` `6s` `8s`	The duration of the generated video.
`image_url`	`string`	✓	–	–	URL of the input image to animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be cropped to fit.
`resolution`	`string`		`720p`	`720p` `1080p` `4k`	The resolution of the generated video.
`aspect_ratio`	`string`		`auto`	`auto` `16:9` `9:16`	The aspect ratio of the generated video. Only 16:9 and 9:16 are supported.
`generate_audio`	`boolean`		`true`	–	Whether to generate audio for the video.
`negative_prompt`	`string`		–	–	A negative prompt to guide the video generation.
`safety_tolerance`	`string`		`4`	`1` `2` `3` `4` `5` `6`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter.

veo-3.1-lite-image — Veo 3.1 Lite Image-to-Video (Google)

Veo 3.1 Lite: animate a reference image at lowest cost.

Parameter	Type	Req	Default	Values / Range	Description
`seed`	`integer`		–	–	The seed for the random number generator.
`prompt`	`string`	✓	–	–	The text prompt describing the video you want to generate
`auto_fix`	`boolean`		–	–	Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them.
`duration`	`string`		`8s`	`4s` `6s` `8s`	The duration of the generated video.
`image_url`	`string`	✓	–	–	URL of the input image to animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be cropped to fit.
`resolution`	`string`		`720p`	`720p` `1080p`	The resolution of the generated video.
`aspect_ratio`	`string`		`auto`	`auto` `16:9` `9:16`	The aspect ratio of the generated video. Only 16:9 and 9:16 are supported.
`generate_audio`	`boolean`		`true`	–	Whether to generate audio for the video.
`negative_prompt`	`string`		–	–	A negative prompt to guide the video generation.
`safety_tolerance`	`string`		`4`	`1` `2` `3` `4` `5` `6`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter.

veo-3-image — Veo 3 Image-to-Video (Google)

Google Veo 3: animate a single reference image.

Parameter	Type	Req	Default	Values / Range	Description
`seed`	`integer`		–	–	The seed for the random number generator.
`prompt`	`string`	✓	–	–	The text prompt describing how the image should be animated
`auto_fix`	`boolean`		–	–	Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them.
`duration`	`string`		`8s`	`4s` `6s` `8s`	The duration of the generated video.
`image_url`	`string`	✓	–	–	URL of the input image to animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be cropped to fit.
`resolution`	`string`		`720p`	`720p` `1080p`	The resolution of the generated video.
`aspect_ratio`	`string`		`auto`	`auto` `16:9` `9:16`	The aspect ratio of the generated video.
`generate_audio`	`boolean`		`true`	–	Whether to generate audio for the video.
`negative_prompt`	`string`		–	–	A negative prompt to guide the video generation.
`safety_tolerance`	`string`		`4`	`1` `2` `3` `4` `5` `6`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter.

kling-v3-standard-image — Kling v3 Standard Image-to-Video (Kuaishou)

Kling v3 Standard image-to-video (3-15 seconds).

Parameter	Type	Req	Default	Values / Range	Description
`prompt`	`string`		–	–	Text prompt for video generation. Either prompt or multi_prompt must be provided, but not both.
`duration`	`string`		`5`	`3` `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15`	The duration of the generated video in seconds
`elements`	`array`		–	–	Elements (characters/objects) to include in the video. Each element can either be an image set (frontal + reference images) or a video. Reference in prompt as @Element1, @Element2, etc.
`cfg_scale`	`number`		`0.5`	–	The CFG (Classifier Free Guidance) scale is a measure of how close you want the model to stick to your prompt.
`shot_type`	`string`		`customize`	`customize` `intelligent`	The type of multi-shot video generation. ‘intelligent’ lets the model automatically determine shot structure.
`multi_prompt`	`array`		–	–	List of prompts for multi-shot video generation. If provided, divides the video into multiple shots.
`end_image_url`	`string`		–	–	URL of the image to be used for the end of the video
`generate_audio`	`boolean`		`true`	–	Whether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase.
`negative_prompt`	`string`		`blur, distort, and low quality`	–	–
`start_image_url`	`string`	✓	–	–	URL of the image to be used for the video

kling-v3-pro-image — Kling v3 Pro Image-to-Video (Kuaishou)

Kling v3 Pro image-to-video (3-15 seconds).

Parameter	Type	Req	Default	Values / Range	Description
`prompt`	`string`		–	–	Text prompt for video generation. Either prompt or multi_prompt must be provided, but not both.
`duration`	`string`		`5`	`3` `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15`	The duration of the generated video in seconds
`elements`	`array`		–	–	Elements (characters/objects) to include in the video. Each element can either be an image set (frontal + reference images) or a video. Reference in prompt as @Element1, @Element2, etc.
`cfg_scale`	`number`		`0.5`	–	The CFG (Classifier Free Guidance) scale is a measure of how close you want the model to stick to your prompt.
`shot_type`	`string`		`customize`	`customize` `intelligent`	The type of multi-shot video generation. ‘intelligent’ lets the model automatically determine shot structure.
`multi_prompt`	`array`		–	–	List of prompts for multi-shot video generation. If provided, divides the video into multiple shots.
`end_image_url`	`string`		–	–	URL of the image to be used for the end of the video
`generate_audio`	`boolean`		`true`	–	Whether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase.
`negative_prompt`	`string`		`blur, distort, and low quality`	–	–
`start_image_url`	`string`	✓	–	–	URL of the image to be used for the video

wan-2.7-image — WAN 2.7 Image-to-Video (Alibaba)

WAN 2.7 image-to-video (720p/1080p, duration 2-15s).

Parameter	Type	Default	Values / Range	Description
`seed`	`integer`	–	–	Random seed for reproducibility (0-2147483647).
`prompt`	`string`	–	–	Text prompt describing the desired video. Max 5000 characters.
`duration`	`integer`	`5`	`2` `3` `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15`	Output video duration in seconds (2-15).
`audio_url`	`string`	–	–	URL of driving audio. Supports WAV and MP3. Duration: 2-30s. Max 15 MB.
`image_url`	`string`	–	–	URL of the first frame image. Formats: JPEG, JPG, PNG, BMP, WEBP. Max 20 MB.
`video_url`	`string`	–	–	URL of a video clip to continue from. Format: MP4, MOV. Duration: 2-10s. Max 100 MB. Cannot be combined with image_url.
`resolution`	`string`	`1080p`	`720p` `1080p`	Output video resolution tier.
`end_image_url`	`string`	–	–	URL of the last frame image for first-and-last-frame-to-video. Same constraints as image_url.
`negative_prompt`	`string`	–	–	Content to avoid in the video. Max 500 characters.
`enable_safety_checker`	`boolean`	`true`	–	Enable content moderation for input and output.
`enable_prompt_expansion`	`boolean`	`true`	–	Enable intelligent prompt rewriting.

seedance-2.0-image — Seedance 2.0 Image-to-Video (ByteDance)

ByteDance Seedance 2.0: animate images with cinematic quality and synchronized audio.

Parameter	Type	Req	Default	Values / Range	Description
`seed`	`integer`		–	–	Random seed for reproducibility. Note that results may still vary slightly even with the same seed.
`prompt`	`string`	✓	–	–	The text prompt describing the desired motion and action for the video.
`duration`	`string`		`4`	`4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15`	Duration of the video in seconds (4-15).
`image_url`	`string`	✓	–	–	The URL of the starting frame image to animate. Supported formats: JPEG, PNG, WebP. Max 30 MB.
`resolution`	`string`		`720p`	`480p` `720p` `1080p`	Video resolution - 480p for faster generation, 720p for balance, 1080p for highest quality.
`end_user_id`	`string`		–	–	The unique user ID of the end user.
`aspect_ratio`	`string`		`auto`	`auto` `21:9` `16:9` `4:3` `1:1` `3:4` `9:16`	The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to infer from the input image.
`end_image_url`	`string`		–	–	The URL of the image to use as the last frame of the video. When provided, the generated video will transition from the starting image to this ending image. Supported formats: JPEG, PNG, WebP. Max 30 MB.
`generate_audio`	`boolean`		`true`	–	Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not.

seedance-2.0-fast-image — Seedance 2.0 Fast Image-to-Video (ByteDance)

ByteDance Seedance 2.0 fast tier: lower-latency image-to-video with synchronized audio.

Parameter	Type	Req	Default	Values / Range	Description
`seed`	`integer`		–	–	Random seed for reproducibility. Note that results may still vary slightly even with the same seed.
`prompt`	`string`	✓	–	–	The text prompt describing the desired motion and action for the video.
`duration`	`string`		`4`	`4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15`	Duration of the video in seconds (4-15).
`image_url`	`string`	✓	–	–	The URL of the starting frame image to animate. Supported formats: JPEG, PNG, WebP. Max 30 MB.
`resolution`	`string`		`720p`	`480p` `720p`	Video resolution - 480p for faster generation, 720p for balance.
`end_user_id`	`string`		–	–	The unique user ID of the end user.
`aspect_ratio`	`string`		`auto`	`auto` `21:9` `16:9` `4:3` `1:1` `3:4` `9:16`	The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to infer from the input image.
`end_image_url`	`string`		–	–	The URL of the image to use as the last frame of the video. When provided, the generated video will transition from the starting image to this ending image. Supported formats: JPEG, PNG, WebP. Max 30 MB.
`generate_audio`	`boolean`		`true`	–	Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not.

veo-3.1-ref — Veo 3.1 Multi-Ref (Google)

Veo 3.1: generate video from reference images for consistent subject appearance.

Parameter	Type	Req	Default	Values / Range	Description
`prompt`	`string`	✓	–	–	The text prompt describing the video you want to generate
`auto_fix`	`boolean`		–	–	Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them.
`duration`	`string`		`8s`	`4s` `6s` `8s`	The duration of the generated video.
`image_urls`	`array`	✓	–	–	URLs of the reference images to use for consistent subject appearance
`resolution`	`string`		`720p`	`720p` `1080p` `4k`	The resolution of the generated video.
`aspect_ratio`	`string`		`16:9`	`16:9` `9:16`	The aspect ratio of the generated video.
`generate_audio`	`boolean`		`true`	–	Whether to generate audio for the video.
`safety_tolerance`	`string`		`4`	`1` `2` `3` `4` `5` `6`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter.

veo-3.1-fast-ref — Veo 3.1 Fast Multi-Ref (Google)

Veo 3.1 Fast: multi-reference video at lower cost.

Parameter	Type	Req	Default	Values / Range	Description
`prompt`	`string`	✓	–	–	The text prompt describing the video you want to generate
`auto_fix`	`boolean`		–	–	Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them.
`duration`	`string`		`8s`	`4s` `6s` `8s`	The duration of the generated video.
`image_urls`	`array`	✓	–	–	URLs of the reference images to use for consistent subject appearance
`resolution`	`string`		`720p`	`720p` `1080p` `4k`	The resolution of the generated video.
`aspect_ratio`	`string`		`16:9`	`16:9` `9:16`	The aspect ratio of the generated video.
`generate_audio`	`boolean`		`true`	–	Whether to generate audio for the video.
`safety_tolerance`	`string`		`4`	`1` `2` `3` `4` `5` `6`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter.

wan-2.7-ref — WAN 2.7 Reference-to-Video (Alibaba)

WAN 2.7 reference-to-video using character/object reference images and videos (duration 2-10s).

Parameter	Type	Req	Default	Values / Range	Description
`seed`	`integer`		–	–	Random seed for reproducibility (0-2147483647).
`prompt`	`string`	✓	–	–	Text prompt describing the desired video. Max 5000 characters.
`duration`	`integer`		`5`	`2` `3` `4` `5` `6` `7` `8` `9` `10`	Output video duration in seconds (2-10).
`resolution`	`string`		`1080p`	`720p` `1080p`	Output video resolution tier.
`multi_shots`	`boolean`		`false`	–	When true, enables intelligent multi-shot segmentation. When false (default), generates a single continuous shot.
`aspect_ratio`	`string`		`16:9`	`16:9` `9:16` `1:1` `4:3` `3:4`	Aspect ratio of the generated video.
`negative_prompt`	`string`		–	–	Content to avoid in the video. Max 500 characters.
`reference_image_urls`	`array`		–	–	Reference image URLs for character/object appearance. Pass multiple images for multi-subject generation. Max 20 MB each.
`reference_video_urls`	`array`		–	–	Reference video URLs for character/object appearance and motion. Pass multiple videos for multi-subject generation. Max 100 MB each. Note: when video inputs are provided, billing includes the total input video duration plus the output duration. Your charged credits will be higher than the output duration alone.
`enable_safety_checker`	`boolean`		`true`	–	Enable content moderation for input and output.

seedance-2.0-ref — Seedance 2.0 Reference-to-Video (ByteDance)

ByteDance Seedance 2.0: generate video from reference images, videos, and audio clips.

Parameter	Type	Req	Default	Values / Range	Description
`seed`	`integer`		–	–	Random seed for reproducibility. Note that results may still vary slightly even with the same seed.
`prompt`	`string`	✓	–	–	The text prompt used to generate the video.
`duration`	`string`		`4`	`4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15`	Duration of the video in seconds (4-15).
`audio_urls`	`array`		–	–	Reference audio to guide video generation. Refer to them in the prompt as @Audio1, @Audio2, etc. Supported formats: MP3, WAV. Up to 3 files, combined duration must not exceed 15 seconds. Max 15 MB per file.If audio is provided, at least one reference image or video is required.
`image_urls`	`array`		–	–	Reference images to guide video generation. Refer to them in the prompt as @Image1, @Image2, etc. Supported formats: JPEG, PNG, WebP. Max 30 MB per image. Up to 9 images. Total files across all modalities must not exceed 12.
`resolution`	`string`		`720p`	`480p` `720p` `1080p`	Video resolution - 480p for faster generation, 720p for balance, 1080p for highest quality.
`video_urls`	`array`		–	–	Reference videos to guide video generation. Refer to them in the prompt as @Video1, @Video2, etc. Supported formats: MP4, MOV. Up to 3 videos, combined duration must be between 2 and 15 seconds, total size under 50 MB. Each video must be between ~480p (640x640) and ~720p (834x1112) in resolution.
`end_user_id`	`string`		–	–	The unique user ID of the end user.
`aspect_ratio`	`string`		`auto`	`auto` `21:9` `16:9` `4:3` `1:1` `3:4` `9:16`	The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide.
`generate_audio`	`boolean`		`true`	–	Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not.

seedance-2.0-fast-ref — Seedance 2.0 Fast Reference-to-Video (ByteDance)

ByteDance Seedance 2.0 fast tier: reference-to-video with lower latency and cost.

Parameter	Type	Req	Default	Values / Range	Description
`seed`	`integer`		–	–	Random seed for reproducibility. Note that results may still vary slightly even with the same seed.
`prompt`	`string`	✓	–	–	The text prompt used to generate the video.
`duration`	`string`		`4`	`4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15`	Duration of the video in seconds (4-15).
`audio_urls`	`array`		–	–	Reference audio to guide video generation. Refer to them in the prompt as @Audio1, @Audio2, etc. Supported formats: MP3, WAV. Up to 3 files, combined duration must not exceed 15 seconds. Max 15 MB per file.If audio is provided, at least one reference image or video is required.
`image_urls`	`array`		–	–	Reference images to guide video generation. Refer to them in the prompt as @Image1, @Image2, etc. Supported formats: JPEG, PNG, WebP. Max 30 MB per image. Up to 9 images. Total files across all modalities must not exceed 12.
`resolution`	`string`		`720p`	`480p` `720p`	Video resolution - 480p for faster generation, 720p for balance.
`video_urls`	`array`		–	–	Reference videos to guide video generation. Refer to them in the prompt as @Video1, @Video2, etc. Supported formats: MP4, MOV. Up to 3 videos, combined duration must be between 2 and 15 seconds, total size under 50 MB. Each video must be between ~480p (640x640) and ~720p (834x1112) in resolution.
`end_user_id`	`string`		–	–	The unique user ID of the end user.
`aspect_ratio`	`string`		`auto`	`auto` `21:9` `16:9` `4:3` `1:1` `3:4` `9:16`	The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide.
`generate_audio`	`boolean`		`true`	–	Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not.

veo-3.1-first-last — Veo 3.1 First-Last Frame (Google)

Veo 3.1: generate a transition video between a start frame and an end frame.

Parameter	Type	Req	Default	Values / Range	Description
`seed`	`integer`		–	–	The seed for the random number generator.
`prompt`	`string`	✓	–	–	The text prompt describing the video you want to generate
`auto_fix`	`boolean`		–	–	Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them.
`duration`	`string`		`8s`	`4s` `6s` `8s`	The duration of the generated video.
`resolution`	`string`		`720p`	`720p` `1080p` `4k`	The resolution of the generated video.
`aspect_ratio`	`string`		`auto`	`auto` `16:9` `9:16`	The aspect ratio of the generated video.
`generate_audio`	`boolean`		`true`	–	Whether to generate audio for the video.
`last_frame_url`	`string`	✓	–	–	URL of the last frame of the video
`first_frame_url`	`string`	✓	–	–	URL of the first frame of the video
`negative_prompt`	`string`		–	–	A negative prompt to guide the video generation.
`safety_tolerance`	`string`		`4`	`1` `2` `3` `4` `5` `6`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter.

wan-2.7-edit — WAN 2.7 Edit Video (Alibaba)

WAN 2.7 video editing: instruction-based editing, reference-image-based editing and style transfer (input video 2-10s).

Parameter	Type	Req	Default	Values / Range	Description
`seed`	`integer`		–	–	Random seed for reproducibility (0-2147483647).
`prompt`	`string`	✓	–	–	Editing instruction or style transfer description. Describe what changes you want applied to the video.
`duration`	`string`		`0`	`0` `2` `3` `4` `5` `6` `7` `8` `9` `10`	Output duration in seconds. ‘0’ means match the input video’s duration. When set to 2-10, the output is truncated to that length from the start.
`video_url`	`string`	✓	–	–	URL of the input video to edit. Format: MP4, MOV. Duration: 2-10s. Max 100 MB.
`resolution`	`string`		`1080p`	`720p` `1080p`	Output video resolution tier.
`aspect_ratio`	`string`		–	`16:9` `9:16` `1:1` `4:3` `3:4`	Aspect ratio of the output video. If not provided, uses the input video’s aspect ratio.
`audio_setting`	`string`		`auto`	`auto` `origin`	Audio handling: ‘auto’ lets the model decide whether to regenerate audio; ‘origin’ preserves the original audio from the input video.
`reference_image_url`	`string`		–	–	Optional reference image URL for reference-based editing. When provided, the edit is guided by the visual style or content of this image.
`enable_safety_checker`	`boolean`		`true`	–	Enable content moderation for input and output.

​接口地址

​模型参考（Model Reference）

接口地址

模型参考（Model Reference）