Generates video using the selected model. The generation mode (text-to-video / image-to-video / etc.) is determined entirely by the model you choose — no separate mode field is needed.
Text-to-video — pass prompt only:
veo-3.1 · veo-3.1-fast · veo-3.1-lite · veo-3 · kling-v3-pro · kling-v3-standard · wan-2.7 · seedance-2.0
Image-to-video — pass prompt + image_urls (1 image):
veo-3.1-image · veo-3.1-fast-image · veo-3.1-lite-image · veo-3-image · kling-v3-pro-image · kling-v3-standard-image · wan-2.7-image · seedance-2.0-image
Multi-reference video — image_urls with 2–9 images:
veo-3.1-ref · veo-3.1-fast-ref
First-last frame — image_urls[0] = start frame, image_urls[1] = end frame:
veo-3.1-first-last
Always pass image(s) via
image_urls: string[]; the service maps them to the correct fal field automatically.
Workflow
GET /hub/v1/models?capability=video — browse available modelsGET /hub/v1/models/:model — copy the example as your inputPOST /hub/v1/video ← you are hereGET /hub/v1/tasks/:task_id — poll until ready=trueTip: Click Try it out → select a model from the dropdown below → the parameter schema auto-populates with an example.
veo-3.1 — Veo 3.1 (Google) · text-to-videoGoogle Veo 3.1 text-to-video with optional native audio.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | The seed for the random number generator. | |
prompt | string | ✓ | – | – | The text prompt describing the video you want to generate |
auto_fix | boolean | true | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. | |
duration | string | 8s | 4s 6s 8s | The duration of the generated video. | |
resolution | string | 720p | 720p 1080p 4k | The resolution of the generated video. | |
aspect_ratio | string | 16:9 | 16:9 9:16 | Aspect ratio of the generated video | |
generate_audio | boolean | true | – | Whether to generate audio for the video. | |
negative_prompt | string | – | – | A negative prompt to guide the video generation. | |
safety_tolerance | string | 4 | 1 2 3 4 5 6 | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |
{
"model": "veo-3.1",
"input": {
"prompt": "Two person street interview in New York City.\nSample Dialogue:\nHost: \"Did you hear the news?\"\nPerson: \"Yes! Veo 3.1 is now available online. If you want to see it, go check it out!\"",
"auto_fix": true,
"duration": "8s",
"resolution": "720p",
"aspect_ratio": "16:9",
"generate_audio": true,
"safety_tolerance": "4"
}
}
veo-3.1-image — Veo 3.1 Image-to-Video (Google) · image-to-videoVeo 3.1: animate a single reference image.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | The seed for the random number generator. | |
prompt | string | ✓ | – | – | The text prompt describing the video you want to generate |
auto_fix | boolean | – | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. | |
duration | string | 8s | 4s 6s 8s | The duration of the generated video. | |
image_url | string | ✓ | – | – | URL of the input image to animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be cropped to fit. |
resolution | string | 720p | 720p 1080p 4k | The resolution of the generated video. | |
aspect_ratio | string | auto | auto 16:9 9:16 | The aspect ratio of the generated video. Only 16:9 and 9:16 are supported. | |
generate_audio | boolean | true | – | Whether to generate audio for the video. | |
negative_prompt | string | – | – | A negative prompt to guide the video generation. | |
safety_tolerance | string | 4 | 1 2 3 4 5 6 | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |
{
"model": "veo-3.1-image",
"input": {
"prompt": "A monkey and polar bear host a casual podcast about AI inference, bringing their unique perspectives from different environments (tropical vs. arctic) to discuss how AI systems make decisions and process information.\nSample Dialogue:\nMonkey (Banana): \"Welcome back to Bananas & Ice! I am Banana\"\nPolar Bear (Ice): \"And I'm Ice!\"",
"duration": "8s",
"image_url": "https://example.com/sample-image.jpg",
"resolution": "720p",
"aspect_ratio": "auto",
"generate_audio": true,
"safety_tolerance": "4"
}
}
veo-3.1-ref — Veo 3.1 Multi-Ref (Google) · reference-to-videoVeo 3.1: generate video from reference images for consistent subject appearance.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
prompt | string | ✓ | – | – | The text prompt describing the video you want to generate |
auto_fix | boolean | – | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. | |
duration | string | 8s | 4s 6s 8s | The duration of the generated video. | |
image_urls | array | ✓ | – | – | URLs of the reference images to use for consistent subject appearance |
resolution | string | 720p | 720p 1080p 4k | The resolution of the generated video. | |
aspect_ratio | string | 16:9 | 16:9 9:16 | The aspect ratio of the generated video. | |
generate_audio | boolean | true | – | Whether to generate audio for the video. | |
safety_tolerance | string | 4 | 1 2 3 4 5 6 | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |
{
"model": "veo-3.1-ref",
"input": {
"prompt": "A chimpanzee wearing overalls frolics in the grassy field, gently playing with the butterflies. In the background, a circus tent and carousel beckon.",
"duration": "8s",
"image_urls": [
"https://example.com/sample-image.jpg",
"https://example.com/sample-image-2.jpg",
"https://example.com/sample-image-3.jpg"
],
"resolution": "720p",
"aspect_ratio": "16:9",
"generate_audio": true,
"safety_tolerance": "4"
}
}
veo-3.1-first-last — Veo 3.1 First-Last Frame (Google) · first-last-frame-to-videoVeo 3.1: generate a transition video between a start frame and an end frame.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | The seed for the random number generator. | |
prompt | string | ✓ | – | – | The text prompt describing the video you want to generate |
auto_fix | boolean | – | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. | |
duration | string | 8s | 4s 6s 8s | The duration of the generated video. | |
resolution | string | 720p | 720p 1080p 4k | The resolution of the generated video. | |
aspect_ratio | string | auto | auto 16:9 9:16 | The aspect ratio of the generated video. | |
generate_audio | boolean | true | – | Whether to generate audio for the video. | |
last_frame_url | string | ✓ | – | – | URL of the last frame of the video |
first_frame_url | string | ✓ | – | – | URL of the first frame of the video |
negative_prompt | string | – | – | A negative prompt to guide the video generation. | |
safety_tolerance | string | 4 | 1 2 3 4 5 6 | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |
{
"model": "veo-3.1-first-last",
"input": {
"prompt": "A woman looks into the camera, breathes in, then exclaims energetically, \"have you guys checked out this AI video generation? It's incredible!\"",
"duration": "8s",
"resolution": "720p",
"aspect_ratio": "auto",
"generate_audio": true,
"last_frame_url": "https://example.com/sample-image-2.jpg",
"first_frame_url": "https://example.com/sample-image.jpg",
"safety_tolerance": "4"
}
}
veo-3.1-fast — Veo 3.1 Fast (Google) · text-to-videoVeo 3.1 Fast: lower-latency text-to-video at reduced cost.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | The seed for the random number generator. | |
prompt | string | ✓ | – | – | The text prompt describing the video you want to generate |
auto_fix | boolean | true | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. | |
duration | string | 8s | 4s 6s 8s | The duration of the generated video. | |
resolution | string | 720p | 720p 1080p 4k | The resolution of the generated video. | |
aspect_ratio | string | 16:9 | 16:9 9:16 | Aspect ratio of the generated video | |
generate_audio | boolean | true | – | Whether to generate audio for the video. | |
negative_prompt | string | – | – | A negative prompt to guide the video generation. | |
safety_tolerance | string | 4 | 1 2 3 4 5 6 | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |
{
"model": "veo-3.1-fast",
"input": {
"prompt": "Two person street interview in New York City.\nSample Dialogue:\nHost: \"Did you hear the news?\"\nPerson: \"Yes! Veo 3.1 is now available online. If you want to see it, go check it out!\"",
"auto_fix": true,
"duration": "8s",
"resolution": "720p",
"aspect_ratio": "16:9",
"generate_audio": true,
"safety_tolerance": "4"
}
}
veo-3.1-fast-image — Veo 3.1 Fast Image-to-Video (Google) · image-to-videoVeo 3.1 Fast: animate a reference image at lower cost.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | The seed for the random number generator. | |
prompt | string | ✓ | – | – | The text prompt describing the video you want to generate |
auto_fix | boolean | – | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. | |
duration | string | 8s | 4s 6s 8s | The duration of the generated video. | |
image_url | string | ✓ | – | – | URL of the input image to animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be cropped to fit. |
resolution | string | 720p | 720p 1080p 4k | The resolution of the generated video. | |
aspect_ratio | string | auto | auto 16:9 9:16 | The aspect ratio of the generated video. Only 16:9 and 9:16 are supported. | |
generate_audio | boolean | true | – | Whether to generate audio for the video. | |
negative_prompt | string | – | – | A negative prompt to guide the video generation. | |
safety_tolerance | string | 4 | 1 2 3 4 5 6 | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |
{
"model": "veo-3.1-fast-image",
"input": {
"prompt": "A monkey and polar bear host a casual podcast about AI inference, bringing their unique perspectives from different environments (tropical vs. arctic) to discuss how AI systems make decisions and process information.\nSample Dialogue:\nMonkey (Banana): \"Welcome back to Bananas & Ice! I am Banana\"\nPolar Bear (Ice): \"And I'm Ice!\"",
"duration": "8s",
"image_url": "https://example.com/sample-image.jpg",
"resolution": "720p",
"aspect_ratio": "auto",
"generate_audio": true,
"safety_tolerance": "4"
}
}
veo-3.1-fast-ref — Veo 3.1 Fast Multi-Ref (Google) · reference-to-videoVeo 3.1 Fast: multi-reference video at lower cost.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
prompt | string | ✓ | – | – | The text prompt describing the video you want to generate |
auto_fix | boolean | – | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. | |
duration | string | 8s | 4s 6s 8s | The duration of the generated video. | |
image_urls | array | ✓ | – | – | URLs of the reference images to use for consistent subject appearance |
resolution | string | 720p | 720p 1080p 4k | The resolution of the generated video. | |
aspect_ratio | string | 16:9 | 16:9 9:16 | The aspect ratio of the generated video. | |
generate_audio | boolean | true | – | Whether to generate audio for the video. | |
safety_tolerance | string | 4 | 1 2 3 4 5 6 | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |
{
"model": "veo-3.1-fast-ref",
"input": {
"prompt": "A chimpanzee wearing overalls frolics in the grassy field, gently playing with the butterflies. In the background, a circus tent and carousel beckon.",
"duration": "8s",
"image_urls": [
"https://example.com/sample-image.jpg",
"https://example.com/sample-image-2.jpg",
"https://example.com/sample-image-3.jpg"
],
"resolution": "720p",
"aspect_ratio": "16:9",
"generate_audio": true,
"safety_tolerance": "4"
}
}
veo-3.1-lite — Veo 3.1 Lite (Google) · text-to-videoVeo 3.1 Lite: lowest cost text-to-video (720p/1080p only).
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | The seed for the random number generator. | |
prompt | string | ✓ | – | – | The text prompt describing the video you want to generate |
auto_fix | boolean | true | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. | |
duration | string | 8s | 4s 6s 8s | The duration of the generated video. | |
resolution | string | 720p | 720p 1080p | The resolution of the generated video. | |
aspect_ratio | string | 16:9 | 16:9 9:16 | Aspect ratio of the generated video | |
generate_audio | boolean | true | – | Whether to generate audio for the video. | |
negative_prompt | string | – | – | A negative prompt to guide the video generation. | |
safety_tolerance | string | 4 | 1 2 3 4 5 6 | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |
{
"model": "veo-3.1-lite",
"input": {
"prompt": "A massive blue whale glides through crystal-clear deep ocean water, sunlight rays piercing through the surface above, bioluminescent plankton scattered around, cinematic slow motion",
"auto_fix": true,
"duration": "8s",
"resolution": "720p",
"aspect_ratio": "16:9",
"generate_audio": true,
"safety_tolerance": "4"
}
}
veo-3.1-lite-image — Veo 3.1 Lite Image-to-Video (Google) · image-to-videoVeo 3.1 Lite: animate a reference image at lowest cost.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | The seed for the random number generator. | |
prompt | string | ✓ | – | – | The text prompt describing the video you want to generate |
auto_fix | boolean | – | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. | |
duration | string | 8s | 4s 6s 8s | The duration of the generated video. | |
image_url | string | ✓ | – | – | URL of the input image to animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be cropped to fit. |
resolution | string | 720p | 720p 1080p | The resolution of the generated video. | |
aspect_ratio | string | auto | auto 16:9 9:16 | The aspect ratio of the generated video. Only 16:9 and 9:16 are supported. | |
generate_audio | boolean | true | – | Whether to generate audio for the video. | |
negative_prompt | string | – | – | A negative prompt to guide the video generation. | |
safety_tolerance | string | 4 | 1 2 3 4 5 6 | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |
{
"model": "veo-3.1-lite-image",
"input": {
"prompt": "A massive blue whale glides through crystal-clear deep ocean water, sunlight rays piercing through the surface above, bioluminescent plankton scattered around, cinematic slow motion",
"duration": "8s",
"image_url": "https://example.com/sample-image.jpg",
"resolution": "720p",
"aspect_ratio": "auto",
"generate_audio": true,
"safety_tolerance": "4"
}
}
veo-3 — Veo 3 (Google) · text-to-videoGoogle Veo 3 text-to-video with native audio.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | The seed for the random number generator. | |
prompt | string | ✓ | – | – | The text prompt describing the video you want to generate |
auto_fix | boolean | true | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. | |
duration | string | 8s | 4s 6s 8s | The duration of the generated video. | |
resolution | string | 720p | 720p 1080p | The resolution of the generated video. | |
aspect_ratio | string | 16:9 | 16:9 9:16 | The aspect ratio of the generated video. | |
generate_audio | boolean | true | – | Whether to generate audio for the video. | |
negative_prompt | string | – | – | A negative prompt to guide the video generation. | |
safety_tolerance | string | 4 | 1 2 3 4 5 6 | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |
{
"model": "veo-3",
"input": {
"prompt": "A casual street interview on a busy New York City sidewalk in the afternoon. The interviewer holds a plain, unbranded microphone and asks: Have you seen Google's new Veo3 model It is a super good model. Person replies: Yeah I saw it, it's already available now. It's crazy good.",
"auto_fix": true,
"duration": "8s",
"resolution": "720p",
"aspect_ratio": "16:9",
"generate_audio": true,
"safety_tolerance": "4"
}
}
veo-3-image — Veo 3 Image-to-Video (Google) · image-to-videoGoogle Veo 3: animate a single reference image.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | The seed for the random number generator. | |
prompt | string | ✓ | – | – | The text prompt describing how the image should be animated |
auto_fix | boolean | – | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. | |
duration | string | 8s | 4s 6s 8s | The duration of the generated video. | |
image_url | string | ✓ | – | – | URL of the input image to animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be cropped to fit. |
resolution | string | 720p | 720p 1080p | The resolution of the generated video. | |
aspect_ratio | string | auto | auto 16:9 9:16 | The aspect ratio of the generated video. | |
generate_audio | boolean | true | – | Whether to generate audio for the video. | |
negative_prompt | string | – | – | A negative prompt to guide the video generation. | |
safety_tolerance | string | 4 | 1 2 3 4 5 6 | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |
{
"model": "veo-3-image",
"input": {
"prompt": "A woman looks into the camera, breathes in, then exclaims energetically, \"have you guys checked out this AI video generation? It's incredible!\"",
"duration": "8s",
"image_url": "https://example.com/sample-image.jpg",
"resolution": "720p",
"aspect_ratio": "auto",
"generate_audio": true,
"safety_tolerance": "4"
}
}
kling-v3-standard — Kling v3 Standard (Kuaishou) · text-to-videoKling v3 Standard text-to-video with optional native audio.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
prompt | string | – | – | Text prompt for video generation. Either prompt or multi_prompt must be provided, but not both. | |
duration | string | 5 | 3 4 5 6 7 8 9 10 11 12 13 14 15 | The duration of the generated video in seconds | |
cfg_scale | number | 0.5 | – | The CFG (Classifier Free Guidance) scale is a measure of how close you want the model to stick to your prompt. | |
shot_type | string | customize | customize intelligent | The type of multi-shot video generation. ‘intelligent’ lets the model automatically determine shot structure. | |
aspect_ratio | string | 16:9 | 16:9 9:16 1:1 | The aspect ratio of the generated video frame | |
multi_prompt | array | – | – | List of prompts for multi-shot video generation. If provided, overrides the single prompt and divides the video into multiple shots with specified prompts and durations. | |
generate_audio | boolean | true | – | Whether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase. | |
negative_prompt | string | blur, distort, and low quality | – | – |
{
"model": "kling-v3-standard",
"input": {
"prompt": "Cinematic drone shot flying through ancient stone ruins covered in moss and vines at golden hour. Camera starts low, rises through crumbling archways, revealing a vast misty valley beyond. Volumetric light rays pierce through gaps in the stone. Epic scale, photorealistic, 8K quality.",
"duration": "5",
"cfg_scale": 0.5,
"shot_type": "customize",
"aspect_ratio": "16:9",
"multi_prompt": null,
"generate_audio": true,
"negative_prompt": "blur, distort, and low quality"
}
}
kling-v3-standard-image — Kling v3 Standard Image-to-Video (Kuaishou) · image-to-videoKling v3 Standard image-to-video (3-15 seconds).
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
prompt | string | – | – | Text prompt for video generation. Either prompt or multi_prompt must be provided, but not both. | |
duration | string | 5 | 3 4 5 6 7 8 9 10 11 12 13 14 15 | The duration of the generated video in seconds | |
elements | array | – | – | Elements (characters/objects) to include in the video. Each element can either be an image set (frontal + reference images) or a video. Reference in prompt as @Element1, @Element2, etc. | |
cfg_scale | number | 0.5 | – | The CFG (Classifier Free Guidance) scale is a measure of how close you want the model to stick to your prompt. | |
shot_type | string | customize | customize intelligent | The type of multi-shot video generation. ‘intelligent’ lets the model automatically determine shot structure. | |
multi_prompt | array | – | – | List of prompts for multi-shot video generation. If provided, divides the video into multiple shots. | |
end_image_url | string | – | – | URL of the image to be used for the end of the video | |
generate_audio | boolean | true | – | Whether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase. | |
negative_prompt | string | blur, distort, and low quality | – | – | |
start_image_url | string | ✓ | – | – | URL of the image to be used for the video |
{
"model": "kling-v3-standard-image",
"input": {
"prompt": "Camera slowly orbits around the vase. Soft light shifts across the ceramic surface. The pampas grass sways gently. Shadows move elegantly. Smooth continuous motion, premium feel.",
"duration": "12",
"cfg_scale": 0.5,
"shot_type": "customize",
"multi_prompt": null,
"generate_audio": true,
"negative_prompt": "blur, distort, and low quality",
"start_image_url": "https://example.com/sample-image.jpg"
}
}
kling-v3-pro — Kling v3 Pro (Kuaishou) · text-to-videoKling v3 Pro text-to-video with optional native audio.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
prompt | string | – | – | Text prompt for video generation. Either prompt or multi_prompt must be provided, but not both. | |
duration | string | 5 | 3 4 5 6 7 8 9 10 11 12 13 14 15 | The duration of the generated video in seconds | |
cfg_scale | number | 0.5 | – | The CFG (Classifier Free Guidance) scale is a measure of how close you want the model to stick to your prompt. | |
shot_type | string | customize | customize intelligent | The type of multi-shot video generation. ‘intelligent’ lets the model automatically determine shot structure. | |
aspect_ratio | string | 16:9 | 16:9 9:16 1:1 | The aspect ratio of the generated video frame | |
multi_prompt | array | – | – | List of prompts for multi-shot video generation. If provided, overrides the single prompt and divides the video into multiple shots with specified prompts and durations. | |
generate_audio | boolean | true | – | Whether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase. | |
negative_prompt | string | blur, distort, and low quality | – | – |
{
"model": "kling-v3-pro",
"input": {
"prompt": "Close-up of glowing fireflies dancing in a dark forest at twilight. Soft bioluminescent particles float through the air. Shallow depth of field, bokeh lights in background. Magical atmosphere, gentle movement.",
"duration": "5",
"cfg_scale": 0.5,
"shot_type": "customize",
"aspect_ratio": "16:9",
"multi_prompt": null,
"generate_audio": true,
"negative_prompt": "blur, distort, and low quality"
}
}
kling-v3-pro-image — Kling v3 Pro Image-to-Video (Kuaishou) · image-to-videoKling v3 Pro image-to-video (3-15 seconds).
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
prompt | string | – | – | Text prompt for video generation. Either prompt or multi_prompt must be provided, but not both. | |
duration | string | 5 | 3 4 5 6 7 8 9 10 11 12 13 14 15 | The duration of the generated video in seconds | |
elements | array | – | – | Elements (characters/objects) to include in the video. Each element can either be an image set (frontal + reference images) or a video. Reference in prompt as @Element1, @Element2, etc. | |
cfg_scale | number | 0.5 | – | The CFG (Classifier Free Guidance) scale is a measure of how close you want the model to stick to your prompt. | |
shot_type | string | customize | customize intelligent | The type of multi-shot video generation. ‘intelligent’ lets the model automatically determine shot structure. | |
multi_prompt | array | – | – | List of prompts for multi-shot video generation. If provided, divides the video into multiple shots. | |
end_image_url | string | – | – | URL of the image to be used for the end of the video | |
generate_audio | boolean | true | – | Whether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase. | |
negative_prompt | string | blur, distort, and low quality | – | – | |
start_image_url | string | ✓ | – | – | URL of the image to be used for the video |
{
"model": "kling-v3-pro-image",
"input": {
"prompt": "The craftsman slowly examines the bowl, turning it gently in his weathered hands. His eyes reflect years of wisdom. Subtle smile forms on his face. Dust particles drift in warm light. Breathing motion, blinking eyes.",
"duration": "12",
"cfg_scale": 0.5,
"shot_type": "customize",
"multi_prompt": null,
"generate_audio": true,
"negative_prompt": "blur, distort, and low quality",
"start_image_url": "https://example.com/sample-image.jpg"
}
}
wan-2.7 — WAN 2.7 (Alibaba) · text-to-videoWAN 2.7 text-to-video - high quality generation. Default resolution is 1080p.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | Random seed for reproducibility (0-2147483647). | |
prompt | string | ✓ | – | – | Text prompt describing the desired video. Max 5000 characters. |
duration | integer | 5 | 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | Output video duration in seconds (2-15). | |
audio_url | string | – | – | URL of driving audio. Supports WAV and MP3. Duration: 3-30s. Max 15 MB. If not provided, the model auto-generates matching background music. | |
resolution | string | 1080p | 720p 1080p | Output video resolution tier. | |
aspect_ratio | string | 16:9 | 16:9 9:16 1:1 4:3 3:4 | Aspect ratio of the generated video. | |
negative_prompt | string | – | – | Content to avoid in the video. Max 500 characters. | |
enable_safety_checker | boolean | true | – | Enable content moderation for input and output. | |
enable_prompt_expansion | boolean | true | – | Enable intelligent prompt rewriting. |
{
"model": "wan-2.7",
"input": {
"prompt": "A kitten running in a meadow, cinematic lighting, smooth camera movement.",
"duration": 5,
"resolution": "1080p",
"aspect_ratio": "16:9",
"negative_prompt": "low resolution, errors, worst quality, low quality",
"enable_safety_checker": true,
"enable_prompt_expansion": true
}
}
wan-2.7-image — WAN 2.7 Image-to-Video (Alibaba) · image-to-videoWAN 2.7 image-to-video (720p/1080p, duration 2-15s).
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | Random seed for reproducibility (0-2147483647). | |
prompt | string | – | – | Text prompt describing the desired video. Max 5000 characters. | |
duration | integer | 5 | 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | Output video duration in seconds (2-15). | |
audio_url | string | – | – | URL of driving audio. Supports WAV and MP3. Duration: 2-30s. Max 15 MB. | |
image_url | string | – | – | URL of the first frame image. Formats: JPEG, JPG, PNG, BMP, WEBP. Max 20 MB. | |
video_url | string | – | – | URL of a video clip to continue from. Format: MP4, MOV. Duration: 2-10s. Max 100 MB. Cannot be combined with image_url. | |
resolution | string | 1080p | 720p 1080p | Output video resolution tier. | |
end_image_url | string | – | – | URL of the last frame image for first-and-last-frame-to-video. Same constraints as image_url. | |
negative_prompt | string | – | – | Content to avoid in the video. Max 500 characters. | |
enable_safety_checker | boolean | true | – | Enable content moderation for input and output. | |
enable_prompt_expansion | boolean | true | – | Enable intelligent prompt rewriting. |
{
"model": "wan-2.7-image",
"input": {
"prompt": "The massive humpback whale glides slowly through the deep blue water. It turns gracefully, its huge pectoral fin sweeping through the water like a wing. Sunbeams penetrate from above, illuminating the whale's textured skin. Small fish scatter. Awe-inspiring scale and grace.",
"duration": 5,
"image_url": "https://example.com/sample-image.jpg",
"resolution": "1080p",
"negative_prompt": "low resolution, errors, worst quality, low quality, incomplete, extra fingers, bad proportions, blurry, distorted",
"enable_safety_checker": true,
"enable_prompt_expansion": true
}
}
wan-2.7-ref — WAN 2.7 Reference-to-Video (Alibaba) · reference-to-videoWAN 2.7 reference-to-video using character/object reference images and videos (duration 2-10s).
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | Random seed for reproducibility (0-2147483647). | |
prompt | string | ✓ | – | – | Text prompt describing the desired video. Max 5000 characters. |
duration | integer | 5 | 2 3 4 5 6 7 8 9 10 | Output video duration in seconds (2-10). | |
resolution | string | 1080p | 720p 1080p | Output video resolution tier. | |
multi_shots | boolean | false | – | When true, enables intelligent multi-shot segmentation. When false (default), generates a single continuous shot. | |
aspect_ratio | string | 16:9 | 16:9 9:16 1:1 4:3 3:4 | Aspect ratio of the generated video. | |
negative_prompt | string | – | – | Content to avoid in the video. Max 500 characters. | |
reference_image_urls | array | – | – | Reference image URLs for character/object appearance. Pass multiple images for multi-subject generation. Max 20 MB each. | |
reference_video_urls | array | – | – | Reference video URLs for character/object appearance and motion. Pass multiple videos for multi-subject generation. Max 100 MB each. Note: when video inputs are provided, billing includes the total input video duration plus the output duration. Your charged credits will be higher than the output duration alone. | |
enable_safety_checker | boolean | true | – | Enable content moderation for input and output. |
{
"model": "wan-2.7-ref",
"input": {
"prompt": "A person walking through a beautiful garden, cinematic style.",
"duration": 5,
"resolution": "1080p",
"aspect_ratio": "16:9",
"negative_prompt": "low resolution, errors, worst quality, low quality",
"enable_safety_checker": true
}
}
wan-2.7-edit — WAN 2.7 Edit Video (Alibaba) · video-to-videoWAN 2.7 video editing: instruction-based editing, reference-image-based editing and style transfer (input video 2-10s).
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | Random seed for reproducibility (0-2147483647). | |
prompt | string | ✓ | – | – | Editing instruction or style transfer description. Describe what changes you want applied to the video. |
duration | string | 0 | 0 2 3 4 5 6 7 8 9 10 | Output duration in seconds. ‘0’ means match the input video’s duration. When set to 2-10, the output is truncated to that length from the start. | |
video_url | string | ✓ | – | – | URL of the input video to edit. Format: MP4, MOV. Duration: 2-10s. Max 100 MB. |
resolution | string | 1080p | 720p 1080p | Output video resolution tier. | |
aspect_ratio | string | – | 16:9 9:16 1:1 4:3 3:4 | Aspect ratio of the output video. If not provided, uses the input video’s aspect ratio. | |
audio_setting | string | auto | auto origin | Audio handling: ‘auto’ lets the model decide whether to regenerate audio; ‘origin’ preserves the original audio from the input video. | |
reference_image_url | string | – | – | Optional reference image URL for reference-based editing. When provided, the edit is guided by the visual style or content of this image. | |
enable_safety_checker | boolean | true | – | Enable content moderation for input and output. |
{
"model": "wan-2.7-edit",
"input": {
"prompt": "Transform the entire scene into a beautiful watercolor painting style. Soft brushstrokes, flowing paint washes, visible paper texture. Colors should bleed and blend naturally like wet watercolor on paper.",
"video_url": "https://example.com/sample-video.mp4",
"resolution": "1080p",
"audio_setting": "auto",
"enable_safety_checker": true
}
}
seedance-2.0 — Seedance 2.0 (ByteDance) · text-to-videoByteDance Seedance 2.0: cinematic text-to-video with native audio, physics, and camera control.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | Random seed for reproducibility. Note that results may still vary slightly even with the same seed. | |
prompt | string | ✓ | – | – | The text prompt used to generate the video |
duration | string | 4 | 4 5 6 7 8 9 10 11 12 13 14 15 | Duration of the video in seconds (4-15). | |
resolution | string | 720p | 480p 720p 1080p | Video resolution - 480p for faster generation, 720p for balance, 1080p for highest quality. | |
end_user_id | string | – | – | The unique user ID of the end user. | |
aspect_ratio | string | auto | auto 21:9 16:9 4:3 1:1 3:4 9:16 | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. | |
generate_audio | boolean | true | – | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not. |
{
"model": "seedance-2.0",
"input": {
"prompt": "An octopus finds a football in the ocean and excitedly calls its octopus friends to come and play. Cut scene to an octopus football game under the sea.",
"duration": "4",
"resolution": "720p",
"aspect_ratio": "auto",
"generate_audio": true
}
}
seedance-2.0-image — Seedance 2.0 Image-to-Video (ByteDance) · image-to-videoByteDance Seedance 2.0: animate images with cinematic quality and synchronized audio.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | Random seed for reproducibility. Note that results may still vary slightly even with the same seed. | |
prompt | string | ✓ | – | – | The text prompt describing the desired motion and action for the video. |
duration | string | 4 | 4 5 6 7 8 9 10 11 12 13 14 15 | Duration of the video in seconds (4-15). | |
image_url | string | ✓ | – | – | The URL of the starting frame image to animate. Supported formats: JPEG, PNG, WebP. Max 30 MB. |
resolution | string | 720p | 480p 720p 1080p | Video resolution - 480p for faster generation, 720p for balance, 1080p for highest quality. | |
end_user_id | string | – | – | The unique user ID of the end user. | |
aspect_ratio | string | auto | auto 21:9 16:9 4:3 1:1 3:4 9:16 | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to infer from the input image. | |
end_image_url | string | – | – | The URL of the image to use as the last frame of the video. When provided, the generated video will transition from the starting image to this ending image. Supported formats: JPEG, PNG, WebP. Max 30 MB. | |
generate_audio | boolean | true | – | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not. |
{
"model": "seedance-2.0-image",
"input": {
"prompt": "An octopus finds a football in the ocean and excitedly calls its octopus friends to come and play. Cut scene to an octopus football game under the sea.",
"duration": "4",
"image_url": "https://example.com/sample-image.jpg",
"resolution": "720p",
"aspect_ratio": "auto",
"generate_audio": true
}
}
seedance-2.0-ref — Seedance 2.0 Reference-to-Video (ByteDance) · reference-to-videoByteDance Seedance 2.0: generate video from reference images, videos, and audio clips.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | Random seed for reproducibility. Note that results may still vary slightly even with the same seed. | |
prompt | string | ✓ | – | – | The text prompt used to generate the video. |
duration | string | 4 | 4 5 6 7 8 9 10 11 12 13 14 15 | Duration of the video in seconds (4-15). | |
audio_urls | array | – | – | Reference audio to guide video generation. Refer to them in the prompt as @Audio1, @Audio2, etc. Supported formats: MP3, WAV. Up to 3 files, combined duration must not exceed 15 seconds. Max 15 MB per file.If audio is provided, at least one reference image or video is required. | |
image_urls | array | – | – | Reference images to guide video generation. Refer to them in the prompt as @Image1, @Image2, etc. Supported formats: JPEG, PNG, WebP. Max 30 MB per image. Up to 9 images. Total files across all modalities must not exceed 12. | |
resolution | string | 720p | 480p 720p 1080p | Video resolution - 480p for faster generation, 720p for balance, 1080p for highest quality. | |
video_urls | array | – | – | Reference videos to guide video generation. Refer to them in the prompt as @Video1, @Video2, etc. Supported formats: MP4, MOV. Up to 3 videos, combined duration must be between 2 and 15 seconds, total size under 50 MB. Each video must be between ~480p (640x640) and ~720p (834x1112) in resolution. | |
end_user_id | string | – | – | The unique user ID of the end user. | |
aspect_ratio | string | auto | auto 21:9 16:9 4:3 1:1 3:4 9:16 | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. | |
generate_audio | boolean | true | – | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not. |
{
"model": "seedance-2.0-ref",
"input": {
"prompt": "An octopus finds a football in the ocean and excitedly calls its octopus friends to come and play. Cut scene to an octopus football game under the sea.",
"duration": "4",
"image_urls": [
"https://example.com/sample-image.jpg"
],
"resolution": "720p",
"aspect_ratio": "auto",
"generate_audio": true
}
}
seedance-2.0-fast — Seedance 2.0 Fast (ByteDance) · text-to-videoByteDance Seedance 2.0 fast tier: lower-latency text-to-video with native audio.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | Random seed for reproducibility. Note that results may still vary slightly even with the same seed. | |
prompt | string | ✓ | – | – | The text prompt used to generate the video |
duration | string | 4 | 4 5 6 7 8 9 10 11 12 13 14 15 | Duration of the video in seconds (4-15). | |
resolution | string | 720p | 480p 720p | Video resolution - 480p for faster generation, 720p for balance. | |
end_user_id | string | – | – | The unique user ID of the end user. | |
aspect_ratio | string | auto | auto 21:9 16:9 4:3 1:1 3:4 9:16 | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. | |
generate_audio | boolean | true | – | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not. |
{
"model": "seedance-2.0-fast",
"input": {
"prompt": "An octopus finds a football in the ocean and excitedly calls its octopus friends to come and play. Cut scene to an octopus football game under the sea.",
"duration": "4",
"resolution": "720p",
"aspect_ratio": "auto",
"generate_audio": true
}
}
seedance-2.0-fast-image — Seedance 2.0 Fast Image-to-Video (ByteDance) · image-to-videoByteDance Seedance 2.0 fast tier: lower-latency image-to-video with synchronized audio.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | Random seed for reproducibility. Note that results may still vary slightly even with the same seed. | |
prompt | string | ✓ | – | – | The text prompt describing the desired motion and action for the video. |
duration | string | 4 | 4 5 6 7 8 9 10 11 12 13 14 15 | Duration of the video in seconds (4-15). | |
image_url | string | ✓ | – | – | The URL of the starting frame image to animate. Supported formats: JPEG, PNG, WebP. Max 30 MB. |
resolution | string | 720p | 480p 720p | Video resolution - 480p for faster generation, 720p for balance. | |
end_user_id | string | – | – | The unique user ID of the end user. | |
aspect_ratio | string | auto | auto 21:9 16:9 4:3 1:1 3:4 9:16 | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to infer from the input image. | |
end_image_url | string | – | – | The URL of the image to use as the last frame of the video. When provided, the generated video will transition from the starting image to this ending image. Supported formats: JPEG, PNG, WebP. Max 30 MB. | |
generate_audio | boolean | true | – | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not. |
{
"model": "seedance-2.0-fast-image",
"input": {
"prompt": "An octopus finds a football in the ocean and excitedly calls its octopus friends to come and play. Cut scene to an octopus football game under the sea.",
"duration": "4",
"image_url": "https://example.com/sample-image.jpg",
"resolution": "720p",
"aspect_ratio": "auto",
"generate_audio": true
}
}
seedance-2.0-fast-ref — Seedance 2.0 Fast Reference-to-Video (ByteDance) · reference-to-videoByteDance Seedance 2.0 fast tier: reference-to-video with lower latency and cost.
| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
seed | integer | – | – | Random seed for reproducibility. Note that results may still vary slightly even with the same seed. | |
prompt | string | ✓ | – | – | The text prompt used to generate the video. |
duration | string | 4 | 4 5 6 7 8 9 10 11 12 13 14 15 | Duration of the video in seconds (4-15). | |
audio_urls | array | – | – | Reference audio to guide video generation. Refer to them in the prompt as @Audio1, @Audio2, etc. Supported formats: MP3, WAV. Up to 3 files, combined duration must not exceed 15 seconds. Max 15 MB per file.If audio is provided, at least one reference image or video is required. | |
image_urls | array | – | – | Reference images to guide video generation. Refer to them in the prompt as @Image1, @Image2, etc. Supported formats: JPEG, PNG, WebP. Max 30 MB per image. Up to 9 images. Total files across all modalities must not exceed 12. | |
resolution | string | 720p | 480p 720p | Video resolution - 480p for faster generation, 720p for balance. | |
video_urls | array | – | – | Reference videos to guide video generation. Refer to them in the prompt as @Video1, @Video2, etc. Supported formats: MP4, MOV. Up to 3 videos, combined duration must be between 2 and 15 seconds, total size under 50 MB. Each video must be between ~480p (640x640) and ~720p (834x1112) in resolution. | |
end_user_id | string | – | – | The unique user ID of the end user. | |
aspect_ratio | string | auto | auto 21:9 16:9 4:3 1:1 3:4 9:16 | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. | |
generate_audio | boolean | true | – | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not. |
{
"model": "seedance-2.0-fast-ref",
"input": {
"prompt": "An octopus finds a football in the ocean and excitedly calls its octopus friends to come and play. Cut scene to an octopus football game under the sea.",
"duration": "4",
"image_urls": [
"https://example.com/sample-image.jpg"
],
"resolution": "720p",
"aspect_ratio": "auto",
"generate_audio": true
}
}
Documentation Index
Fetch the complete documentation index at: https://docs.mountsea.ai/llms.txt
Use this file to discover all available pages before exploring further.
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Unique task ID — use this to poll GET /hub/v1/tasks/:task_id
"hub-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
Task status at creation time (usually pending)
"pending"
Capability: image | video | audio | transcribe
"video"
Model ID
"veo-3.1-fast"
Model vendor
"Google"
Generation mode (e.g. text-to-video, image-to-image)
"text-to-video"
ISO 8601 creation timestamp
"2026-05-18T09:00:00.000Z"