> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mountsea.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Submit Video Task

> Generates video using the selected model. The generation mode (text-to-video / image-to-video / etc.) is determined entirely by the `model` you choose — no separate `mode` field is needed.

**Text-to-video** — pass `prompt` only:
`veo-3.1` · `veo-3.1-fast` · `veo-3.1-lite` · `veo-3` · `kling-v3-pro` · `kling-v3-standard` · `wan-2.7` · `seedance-2.0`

**Image-to-video** — pass `prompt` + `image_urls` (1 image):
`veo-3.1-image` · `veo-3.1-fast-image` · `veo-3.1-lite-image` · `veo-3-image` · `kling-v3-pro-image` · `kling-v3-standard-image` · `wan-2.7-image` · `seedance-2.0-image`

**Multi-reference video** — `image_urls` with 2–9 images:
`veo-3.1-ref` · `veo-3.1-fast-ref`

**First-last frame** — `image_urls[0]` = start frame, `image_urls[1]` = end frame:
`veo-3.1-first-last`

> Always pass image(s) via `image_urls: string[]`; the service maps them to the correct fal field automatically.

---

**Workflow**
1. `GET /hub/v1/models?capability=video` — browse available models
2. `GET /hub/v1/models/:model` — copy the `example` as your `input`
3. `POST /hub/v1/video` ← you are here
4. `GET /hub/v1/tasks/:task_id` — poll until `ready=true`

---

## Model Reference

> **Tip:** Click **Try it out** → select a model from the dropdown below → the parameter schema auto-populates with an example.

### `veo-3.1` — Veo 3.1 (Google) · `text-to-video`

Google Veo 3.1 text-to-video with optional native audio.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | The seed for the random number generator. |
| `prompt` | `string` | ✓ | – | – | The text prompt describing the video you want to generate |
| `auto_fix` | `boolean` |  | `true` | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. |
| `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the generated video. |
| `resolution` | `string` |  | `720p` | `720p` `1080p` `4k` | The resolution of the generated video. |
| `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` | Aspect ratio of the generated video |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate audio for the video. |
| `negative_prompt` | `string` |  | – | – | A negative prompt to guide the video generation. |
| `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "veo-3.1",
  "input": {
    "prompt": "Two person street interview in New York City.\nSample Dialogue:\nHost: \"Did you hear the news?\"\nPerson: \"Yes! Veo 3.1 is now available online. If you want to see it, go check it out!\"",
    "auto_fix": true,
    "duration": "8s",
    "resolution": "720p",
    "aspect_ratio": "16:9",
    "generate_audio": true,
    "safety_tolerance": "4"
  }
}
```
</details>

---

### `veo-3.1-image` — Veo 3.1 Image-to-Video (Google) · `image-to-video`

Veo 3.1: animate a single reference image.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | The seed for the random number generator. |
| `prompt` | `string` | ✓ | – | – | The text prompt describing the video you want to generate |
| `auto_fix` | `boolean` |  | – | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. |
| `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the generated video. |
| `image_url` | `string` | ✓ | – | – | URL of the input image to animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be cropped to fit. |
| `resolution` | `string` |  | `720p` | `720p` `1080p` `4k` | The resolution of the generated video. |
| `aspect_ratio` | `string` |  | `auto` | `auto` `16:9` `9:16` | The aspect ratio of the generated video. Only 16:9 and 9:16 are supported. |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate audio for the video. |
| `negative_prompt` | `string` |  | – | – | A negative prompt to guide the video generation. |
| `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "veo-3.1-image",
  "input": {
    "prompt": "A monkey and polar bear host a casual podcast about AI inference, bringing their unique perspectives from different environments (tropical vs. arctic) to discuss how AI systems make decisions and process information.\nSample Dialogue:\nMonkey (Banana): \"Welcome back to Bananas & Ice! I am Banana\"\nPolar Bear (Ice): \"And I'm Ice!\"",
    "duration": "8s",
    "image_url": "https://example.com/sample-image.jpg",
    "resolution": "720p",
    "aspect_ratio": "auto",
    "generate_audio": true,
    "safety_tolerance": "4"
  }
}
```
</details>

---

### `veo-3.1-ref` — Veo 3.1 Multi-Ref (Google) · `reference-to-video`

Veo 3.1: generate video from reference images for consistent subject appearance.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `prompt` | `string` | ✓ | – | – | The text prompt describing the video you want to generate |
| `auto_fix` | `boolean` |  | – | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. |
| `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the generated video. |
| `image_urls` | `array` | ✓ | – | – | URLs of the reference images to use for consistent subject appearance |
| `resolution` | `string` |  | `720p` | `720p` `1080p` `4k` | The resolution of the generated video. |
| `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` | The aspect ratio of the generated video. |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate audio for the video. |
| `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "veo-3.1-ref",
  "input": {
    "prompt": "A chimpanzee wearing overalls frolics in the grassy field, gently playing with the butterflies. In the background, a circus tent and carousel beckon.",
    "duration": "8s",
    "image_urls": [
      "https://example.com/sample-image.jpg",
      "https://example.com/sample-image-2.jpg",
      "https://example.com/sample-image-3.jpg"
    ],
    "resolution": "720p",
    "aspect_ratio": "16:9",
    "generate_audio": true,
    "safety_tolerance": "4"
  }
}
```
</details>

---

### `veo-3.1-first-last` — Veo 3.1 First-Last Frame (Google) · `first-last-frame-to-video`

Veo 3.1: generate a transition video between a start frame and an end frame.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | The seed for the random number generator. |
| `prompt` | `string` | ✓ | – | – | The text prompt describing the video you want to generate |
| `auto_fix` | `boolean` |  | – | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. |
| `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the generated video. |
| `resolution` | `string` |  | `720p` | `720p` `1080p` `4k` | The resolution of the generated video. |
| `aspect_ratio` | `string` |  | `auto` | `auto` `16:9` `9:16` | The aspect ratio of the generated video. |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate audio for the video. |
| `last_frame_url` | `string` | ✓ | – | – | URL of the last frame of the video |
| `first_frame_url` | `string` | ✓ | – | – | URL of the first frame of the video |
| `negative_prompt` | `string` |  | – | – | A negative prompt to guide the video generation. |
| `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "veo-3.1-first-last",
  "input": {
    "prompt": "A woman looks into the camera, breathes in, then exclaims energetically, \"have you guys checked out this AI video generation? It's incredible!\"",
    "duration": "8s",
    "resolution": "720p",
    "aspect_ratio": "auto",
    "generate_audio": true,
    "last_frame_url": "https://example.com/sample-image-2.jpg",
    "first_frame_url": "https://example.com/sample-image.jpg",
    "safety_tolerance": "4"
  }
}
```
</details>

---

### `veo-3.1-fast` — Veo 3.1 Fast (Google) · `text-to-video`

Veo 3.1 Fast: lower-latency text-to-video at reduced cost.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | The seed for the random number generator. |
| `prompt` | `string` | ✓ | – | – | The text prompt describing the video you want to generate |
| `auto_fix` | `boolean` |  | `true` | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. |
| `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the generated video. |
| `resolution` | `string` |  | `720p` | `720p` `1080p` `4k` | The resolution of the generated video. |
| `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` | Aspect ratio of the generated video |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate audio for the video. |
| `negative_prompt` | `string` |  | – | – | A negative prompt to guide the video generation. |
| `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "veo-3.1-fast",
  "input": {
    "prompt": "Two person street interview in New York City.\nSample Dialogue:\nHost: \"Did you hear the news?\"\nPerson: \"Yes! Veo 3.1 is now available online. If you want to see it, go check it out!\"",
    "auto_fix": true,
    "duration": "8s",
    "resolution": "720p",
    "aspect_ratio": "16:9",
    "generate_audio": true,
    "safety_tolerance": "4"
  }
}
```
</details>

---

### `veo-3.1-fast-image` — Veo 3.1 Fast Image-to-Video (Google) · `image-to-video`

Veo 3.1 Fast: animate a reference image at lower cost.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | The seed for the random number generator. |
| `prompt` | `string` | ✓ | – | – | The text prompt describing the video you want to generate |
| `auto_fix` | `boolean` |  | – | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. |
| `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the generated video. |
| `image_url` | `string` | ✓ | – | – | URL of the input image to animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be cropped to fit. |
| `resolution` | `string` |  | `720p` | `720p` `1080p` `4k` | The resolution of the generated video. |
| `aspect_ratio` | `string` |  | `auto` | `auto` `16:9` `9:16` | The aspect ratio of the generated video. Only 16:9 and 9:16 are supported. |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate audio for the video. |
| `negative_prompt` | `string` |  | – | – | A negative prompt to guide the video generation. |
| `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "veo-3.1-fast-image",
  "input": {
    "prompt": "A monkey and polar bear host a casual podcast about AI inference, bringing their unique perspectives from different environments (tropical vs. arctic) to discuss how AI systems make decisions and process information.\nSample Dialogue:\nMonkey (Banana): \"Welcome back to Bananas & Ice! I am Banana\"\nPolar Bear (Ice): \"And I'm Ice!\"",
    "duration": "8s",
    "image_url": "https://example.com/sample-image.jpg",
    "resolution": "720p",
    "aspect_ratio": "auto",
    "generate_audio": true,
    "safety_tolerance": "4"
  }
}
```
</details>

---

### `veo-3.1-fast-ref` — Veo 3.1 Fast Multi-Ref (Google) · `reference-to-video`

Veo 3.1 Fast: multi-reference video at lower cost.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `prompt` | `string` | ✓ | – | – | The text prompt describing the video you want to generate |
| `auto_fix` | `boolean` |  | – | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. |
| `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the generated video. |
| `image_urls` | `array` | ✓ | – | – | URLs of the reference images to use for consistent subject appearance |
| `resolution` | `string` |  | `720p` | `720p` `1080p` `4k` | The resolution of the generated video. |
| `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` | The aspect ratio of the generated video. |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate audio for the video. |
| `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "veo-3.1-fast-ref",
  "input": {
    "prompt": "A chimpanzee wearing overalls frolics in the grassy field, gently playing with the butterflies. In the background, a circus tent and carousel beckon.",
    "duration": "8s",
    "image_urls": [
      "https://example.com/sample-image.jpg",
      "https://example.com/sample-image-2.jpg",
      "https://example.com/sample-image-3.jpg"
    ],
    "resolution": "720p",
    "aspect_ratio": "16:9",
    "generate_audio": true,
    "safety_tolerance": "4"
  }
}
```
</details>

---

### `veo-3.1-lite` — Veo 3.1 Lite (Google) · `text-to-video`

Veo 3.1 Lite: lowest cost text-to-video (720p/1080p only).

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | The seed for the random number generator. |
| `prompt` | `string` | ✓ | – | – | The text prompt describing the video you want to generate |
| `auto_fix` | `boolean` |  | `true` | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. |
| `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the generated video. |
| `resolution` | `string` |  | `720p` | `720p` `1080p` | The resolution of the generated video. |
| `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` | Aspect ratio of the generated video |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate audio for the video. |
| `negative_prompt` | `string` |  | – | – | A negative prompt to guide the video generation. |
| `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "veo-3.1-lite",
  "input": {
    "prompt": "A massive blue whale glides through crystal-clear deep ocean water, sunlight rays piercing through the surface above, bioluminescent plankton scattered around, cinematic slow motion",
    "auto_fix": true,
    "duration": "8s",
    "resolution": "720p",
    "aspect_ratio": "16:9",
    "generate_audio": true,
    "safety_tolerance": "4"
  }
}
```
</details>

---

### `veo-3.1-lite-image` — Veo 3.1 Lite Image-to-Video (Google) · `image-to-video`

Veo 3.1 Lite: animate a reference image at lowest cost.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | The seed for the random number generator. |
| `prompt` | `string` | ✓ | – | – | The text prompt describing the video you want to generate |
| `auto_fix` | `boolean` |  | – | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. |
| `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the generated video. |
| `image_url` | `string` | ✓ | – | – | URL of the input image to animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be cropped to fit. |
| `resolution` | `string` |  | `720p` | `720p` `1080p` | The resolution of the generated video. |
| `aspect_ratio` | `string` |  | `auto` | `auto` `16:9` `9:16` | The aspect ratio of the generated video. Only 16:9 and 9:16 are supported. |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate audio for the video. |
| `negative_prompt` | `string` |  | – | – | A negative prompt to guide the video generation. |
| `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "veo-3.1-lite-image",
  "input": {
    "prompt": "A massive blue whale glides through crystal-clear deep ocean water, sunlight rays piercing through the surface above, bioluminescent plankton scattered around, cinematic slow motion",
    "duration": "8s",
    "image_url": "https://example.com/sample-image.jpg",
    "resolution": "720p",
    "aspect_ratio": "auto",
    "generate_audio": true,
    "safety_tolerance": "4"
  }
}
```
</details>

---

### `veo-3` — Veo 3 (Google) · `text-to-video`

Google Veo 3 text-to-video with native audio.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | The seed for the random number generator. |
| `prompt` | `string` | ✓ | – | – | The text prompt describing the video you want to generate |
| `auto_fix` | `boolean` |  | `true` | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. |
| `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the generated video. |
| `resolution` | `string` |  | `720p` | `720p` `1080p` | The resolution of the generated video. |
| `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` | The aspect ratio of the generated video. |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate audio for the video. |
| `negative_prompt` | `string` |  | – | – | A negative prompt to guide the video generation. |
| `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "veo-3",
  "input": {
    "prompt": "A casual street interview on a busy New York City sidewalk in the afternoon. The interviewer holds a plain, unbranded microphone and asks: Have you seen Google's new Veo3 model It is a super good model. Person replies: Yeah I saw it, it's already available now. It's crazy good.",
    "auto_fix": true,
    "duration": "8s",
    "resolution": "720p",
    "aspect_ratio": "16:9",
    "generate_audio": true,
    "safety_tolerance": "4"
  }
}
```
</details>

---

### `veo-3-image` — Veo 3 Image-to-Video (Google) · `image-to-video`

Google Veo 3: animate a single reference image.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | The seed for the random number generator. |
| `prompt` | `string` | ✓ | – | – | The text prompt describing how the image should be animated |
| `auto_fix` | `boolean` |  | – | – | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them. |
| `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the generated video. |
| `image_url` | `string` | ✓ | – | – | URL of the input image to animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be cropped to fit. |
| `resolution` | `string` |  | `720p` | `720p` `1080p` | The resolution of the generated video. |
| `aspect_ratio` | `string` |  | `auto` | `auto` `16:9` `9:16` | The aspect ratio of the generated video. |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate audio for the video. |
| `negative_prompt` | `string` |  | – | – | A negative prompt to guide the video generation. |
| `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Note: API-only parameter. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "veo-3-image",
  "input": {
    "prompt": "A woman looks into the camera, breathes in, then exclaims energetically, \"have you guys checked out this AI video generation? It's incredible!\"",
    "duration": "8s",
    "image_url": "https://example.com/sample-image.jpg",
    "resolution": "720p",
    "aspect_ratio": "auto",
    "generate_audio": true,
    "safety_tolerance": "4"
  }
}
```
</details>

---

### `kling-v3-standard` — Kling v3 Standard (Kuaishou) · `text-to-video`

Kling v3 Standard text-to-video with optional native audio.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `prompt` | `string` |  | – | – | Text prompt for video generation. Either prompt or multi_prompt must be provided, but not both. |
| `duration` | `string` |  | `5` | `3` `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15` | The duration of the generated video in seconds |
| `cfg_scale` | `number` |  | `0.5` | – | The CFG (Classifier Free Guidance) scale is a measure of how close you want the model to stick to your prompt. |
| `shot_type` | `string` |  | `customize` | `customize` `intelligent` | The type of multi-shot video generation. 'intelligent' lets the model automatically determine shot structure. |
| `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` `1:1` | The aspect ratio of the generated video frame |
| `multi_prompt` | `array` |  | – | – | List of prompts for multi-shot video generation. If provided, overrides the single prompt and divides the video into multiple shots with specified prompts and durations. |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase. |
| `negative_prompt` | `string` |  | `blur, distort, and low quality` | – | – |

<details>
<summary>Example request body</summary>

```json
{
  "model": "kling-v3-standard",
  "input": {
    "prompt": "Cinematic drone shot flying through ancient stone ruins covered in moss and vines at golden hour. Camera starts low, rises through crumbling archways, revealing a vast misty valley beyond. Volumetric light rays pierce through gaps in the stone. Epic scale, photorealistic, 8K quality.",
    "duration": "5",
    "cfg_scale": 0.5,
    "shot_type": "customize",
    "aspect_ratio": "16:9",
    "multi_prompt": null,
    "generate_audio": true,
    "negative_prompt": "blur, distort, and low quality"
  }
}
```
</details>

---

### `kling-v3-standard-image` — Kling v3 Standard Image-to-Video (Kuaishou) · `image-to-video`

Kling v3 Standard image-to-video (3-15 seconds).

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `prompt` | `string` |  | – | – | Text prompt for video generation. Either prompt or multi_prompt must be provided, but not both. |
| `duration` | `string` |  | `5` | `3` `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15` | The duration of the generated video in seconds |
| `elements` | `array` |  | – | – | Elements (characters/objects) to include in the video. Each element can either be an image set (frontal + reference images) or a video. Reference in prompt as @Element1, @Element2, etc. |
| `cfg_scale` | `number` |  | `0.5` | – | The CFG (Classifier Free Guidance) scale is a measure of how close you want the model to stick to your prompt. |
| `shot_type` | `string` |  | `customize` | `customize` `intelligent` | The type of multi-shot video generation. 'intelligent' lets the model automatically determine shot structure. |
| `multi_prompt` | `array` |  | – | – | List of prompts for multi-shot video generation. If provided, divides the video into multiple shots. |
| `end_image_url` | `string` |  | – | – | URL of the image to be used for the end of the video |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase. |
| `negative_prompt` | `string` |  | `blur, distort, and low quality` | – | – |
| `start_image_url` | `string` | ✓ | – | – | URL of the image to be used for the video |

<details>
<summary>Example request body</summary>

```json
{
  "model": "kling-v3-standard-image",
  "input": {
    "prompt": "Camera slowly orbits around the vase. Soft light shifts across the ceramic surface. The pampas grass sways gently. Shadows move elegantly. Smooth continuous motion, premium feel.",
    "duration": "12",
    "cfg_scale": 0.5,
    "shot_type": "customize",
    "multi_prompt": null,
    "generate_audio": true,
    "negative_prompt": "blur, distort, and low quality",
    "start_image_url": "https://example.com/sample-image.jpg"
  }
}
```
</details>

---

### `kling-v3-pro` — Kling v3 Pro (Kuaishou) · `text-to-video`

Kling v3 Pro text-to-video with optional native audio.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `prompt` | `string` |  | – | – | Text prompt for video generation. Either prompt or multi_prompt must be provided, but not both. |
| `duration` | `string` |  | `5` | `3` `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15` | The duration of the generated video in seconds |
| `cfg_scale` | `number` |  | `0.5` | – | The CFG (Classifier Free Guidance) scale is a measure of how close you want the model to stick to your prompt. |
| `shot_type` | `string` |  | `customize` | `customize` `intelligent` | The type of multi-shot video generation. 'intelligent' lets the model automatically determine shot structure. |
| `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` `1:1` | The aspect ratio of the generated video frame |
| `multi_prompt` | `array` |  | – | – | List of prompts for multi-shot video generation. If provided, overrides the single prompt and divides the video into multiple shots with specified prompts and durations. |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase. |
| `negative_prompt` | `string` |  | `blur, distort, and low quality` | – | – |

<details>
<summary>Example request body</summary>

```json
{
  "model": "kling-v3-pro",
  "input": {
    "prompt": "Close-up of glowing fireflies dancing in a dark forest at twilight. Soft bioluminescent particles float through the air. Shallow depth of field, bokeh lights in background. Magical atmosphere, gentle movement.",
    "duration": "5",
    "cfg_scale": 0.5,
    "shot_type": "customize",
    "aspect_ratio": "16:9",
    "multi_prompt": null,
    "generate_audio": true,
    "negative_prompt": "blur, distort, and low quality"
  }
}
```
</details>

---

### `kling-v3-pro-image` — Kling v3 Pro Image-to-Video (Kuaishou) · `image-to-video`

Kling v3 Pro image-to-video (3-15 seconds).

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `prompt` | `string` |  | – | – | Text prompt for video generation. Either prompt or multi_prompt must be provided, but not both. |
| `duration` | `string` |  | `5` | `3` `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15` | The duration of the generated video in seconds |
| `elements` | `array` |  | – | – | Elements (characters/objects) to include in the video. Each element can either be an image set (frontal + reference images) or a video. Reference in prompt as @Element1, @Element2, etc. |
| `cfg_scale` | `number` |  | `0.5` | – | The CFG (Classifier Free Guidance) scale is a measure of how close you want the model to stick to your prompt. |
| `shot_type` | `string` |  | `customize` | `customize` `intelligent` | The type of multi-shot video generation. 'intelligent' lets the model automatically determine shot structure. |
| `multi_prompt` | `array` |  | – | – | List of prompts for multi-shot video generation. If provided, divides the video into multiple shots. |
| `end_image_url` | `string` |  | – | – | URL of the image to be used for the end of the video |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase. |
| `negative_prompt` | `string` |  | `blur, distort, and low quality` | – | – |
| `start_image_url` | `string` | ✓ | – | – | URL of the image to be used for the video |

<details>
<summary>Example request body</summary>

```json
{
  "model": "kling-v3-pro-image",
  "input": {
    "prompt": "The craftsman slowly examines the bowl, turning it gently in his weathered hands. His eyes reflect years of wisdom. Subtle smile forms on his face. Dust particles drift in warm light. Breathing motion, blinking eyes.",
    "duration": "12",
    "cfg_scale": 0.5,
    "shot_type": "customize",
    "multi_prompt": null,
    "generate_audio": true,
    "negative_prompt": "blur, distort, and low quality",
    "start_image_url": "https://example.com/sample-image.jpg"
  }
}
```
</details>

---

### `wan-2.7` — WAN 2.7 (Alibaba) · `text-to-video`

WAN 2.7 text-to-video - high quality generation. Default resolution is 1080p.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | Random seed for reproducibility (0-2147483647). |
| `prompt` | `string` | ✓ | – | – | Text prompt describing the desired video. Max 5000 characters. |
| `duration` | `integer` |  | `5` | `2` `3` `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15` | Output video duration in seconds (2-15). |
| `audio_url` | `string` |  | – | – | URL of driving audio. Supports WAV and MP3. Duration: 3-30s. Max 15 MB. If not provided, the model auto-generates matching background music. |
| `resolution` | `string` |  | `1080p` | `720p` `1080p` | Output video resolution tier. |
| `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` `1:1` `4:3` `3:4` | Aspect ratio of the generated video. |
| `negative_prompt` | `string` |  | – | – | Content to avoid in the video. Max 500 characters. |
| `enable_safety_checker` | `boolean` |  | `true` | – | Enable content moderation for input and output. |
| `enable_prompt_expansion` | `boolean` |  | `true` | – | Enable intelligent prompt rewriting. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "wan-2.7",
  "input": {
    "prompt": "A kitten running in a meadow, cinematic lighting, smooth camera movement.",
    "duration": 5,
    "resolution": "1080p",
    "aspect_ratio": "16:9",
    "negative_prompt": "low resolution, errors, worst quality, low quality",
    "enable_safety_checker": true,
    "enable_prompt_expansion": true
  }
}
```
</details>

---

### `wan-2.7-image` — WAN 2.7 Image-to-Video (Alibaba) · `image-to-video`

WAN 2.7 image-to-video (720p/1080p, duration 2-15s).

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | Random seed for reproducibility (0-2147483647). |
| `prompt` | `string` |  | – | – | Text prompt describing the desired video. Max 5000 characters. |
| `duration` | `integer` |  | `5` | `2` `3` `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15` | Output video duration in seconds (2-15). |
| `audio_url` | `string` |  | – | – | URL of driving audio. Supports WAV and MP3. Duration: 2-30s. Max 15 MB. |
| `image_url` | `string` |  | – | – | URL of the first frame image. Formats: JPEG, JPG, PNG, BMP, WEBP. Max 20 MB. |
| `video_url` | `string` |  | – | – | URL of a video clip to continue from. Format: MP4, MOV. Duration: 2-10s. Max 100 MB. Cannot be combined with image_url. |
| `resolution` | `string` |  | `1080p` | `720p` `1080p` | Output video resolution tier. |
| `end_image_url` | `string` |  | – | – | URL of the last frame image for first-and-last-frame-to-video. Same constraints as image_url. |
| `negative_prompt` | `string` |  | – | – | Content to avoid in the video. Max 500 characters. |
| `enable_safety_checker` | `boolean` |  | `true` | – | Enable content moderation for input and output. |
| `enable_prompt_expansion` | `boolean` |  | `true` | – | Enable intelligent prompt rewriting. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "wan-2.7-image",
  "input": {
    "prompt": "The massive humpback whale glides slowly through the deep blue water. It turns gracefully, its huge pectoral fin sweeping through the water like a wing. Sunbeams penetrate from above, illuminating the whale's textured skin. Small fish scatter. Awe-inspiring scale and grace.",
    "duration": 5,
    "image_url": "https://example.com/sample-image.jpg",
    "resolution": "1080p",
    "negative_prompt": "low resolution, errors, worst quality, low quality, incomplete, extra fingers, bad proportions, blurry, distorted",
    "enable_safety_checker": true,
    "enable_prompt_expansion": true
  }
}
```
</details>

---

### `wan-2.7-ref` — WAN 2.7 Reference-to-Video (Alibaba) · `reference-to-video`

WAN 2.7 reference-to-video using character/object reference images and videos (duration 2-10s).

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | Random seed for reproducibility (0-2147483647). |
| `prompt` | `string` | ✓ | – | – | Text prompt describing the desired video. Max 5000 characters. |
| `duration` | `integer` |  | `5` | `2` `3` `4` `5` `6` `7` `8` `9` `10` | Output video duration in seconds (2-10). |
| `resolution` | `string` |  | `1080p` | `720p` `1080p` | Output video resolution tier. |
| `multi_shots` | `boolean` |  | `false` | – | When true, enables intelligent multi-shot segmentation. When false (default), generates a single continuous shot. |
| `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` `1:1` `4:3` `3:4` | Aspect ratio of the generated video. |
| `negative_prompt` | `string` |  | – | – | Content to avoid in the video. Max 500 characters. |
| `reference_image_urls` | `array` |  | – | – | Reference image URLs for character/object appearance. Pass multiple images for multi-subject generation. Max 20 MB each. |
| `reference_video_urls` | `array` |  | – | – | Reference video URLs for character/object appearance and motion. Pass multiple videos for multi-subject generation. Max 100 MB each. Note: when video inputs are provided, billing includes the total input video duration plus the output duration. Your charged credits will be higher than the output duration alone. |
| `enable_safety_checker` | `boolean` |  | `true` | – | Enable content moderation for input and output. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "wan-2.7-ref",
  "input": {
    "prompt": "A person walking through a beautiful garden, cinematic style.",
    "duration": 5,
    "resolution": "1080p",
    "aspect_ratio": "16:9",
    "negative_prompt": "low resolution, errors, worst quality, low quality",
    "enable_safety_checker": true
  }
}
```
</details>

---

### `wan-2.7-edit` — WAN 2.7 Edit Video (Alibaba) · `video-to-video`

WAN 2.7 video editing: instruction-based editing, reference-image-based editing and style transfer (input video 2-10s).

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | Random seed for reproducibility (0-2147483647). |
| `prompt` | `string` | ✓ | – | – | Editing instruction or style transfer description. Describe what changes you want applied to the video. |
| `duration` | `string` |  | `0` | `0` `2` `3` `4` `5` `6` `7` `8` `9` `10` | Output duration in seconds. '0' means match the input video's duration. When set to 2-10, the output is truncated to that length from the start. |
| `video_url` | `string` | ✓ | – | – | URL of the input video to edit. Format: MP4, MOV. Duration: 2-10s. Max 100 MB. |
| `resolution` | `string` |  | `1080p` | `720p` `1080p` | Output video resolution tier. |
| `aspect_ratio` | `string` |  | – | `16:9` `9:16` `1:1` `4:3` `3:4` | Aspect ratio of the output video. If not provided, uses the input video's aspect ratio. |
| `audio_setting` | `string` |  | `auto` | `auto` `origin` | Audio handling: 'auto' lets the model decide whether to regenerate audio; 'origin' preserves the original audio from the input video. |
| `reference_image_url` | `string` |  | – | – | Optional reference image URL for reference-based editing. When provided, the edit is guided by the visual style or content of this image. |
| `enable_safety_checker` | `boolean` |  | `true` | – | Enable content moderation for input and output. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "wan-2.7-edit",
  "input": {
    "prompt": "Transform the entire scene into a beautiful watercolor painting style. Soft brushstrokes, flowing paint washes, visible paper texture. Colors should bleed and blend naturally like wet watercolor on paper.",
    "video_url": "https://example.com/sample-video.mp4",
    "resolution": "1080p",
    "audio_setting": "auto",
    "enable_safety_checker": true
  }
}
```
</details>

---

### `seedance-2.0` — Seedance 2.0 (ByteDance) · `text-to-video`

ByteDance Seedance 2.0: cinematic text-to-video with native audio, physics, and camera control.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | Random seed for reproducibility. Note that results may still vary slightly even with the same seed. |
| `prompt` | `string` | ✓ | – | – | The text prompt used to generate the video |
| `duration` | `string` |  | `4` | `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15` | Duration of the video in seconds (4-15). |
| `resolution` | `string` |  | `720p` | `480p` `720p` `1080p` | Video resolution - 480p for faster generation, 720p for balance, 1080p for highest quality. |
| `end_user_id` | `string` |  | – | – | The unique user ID of the end user. |
| `aspect_ratio` | `string` |  | `auto` | `auto` `21:9` `16:9` `4:3` `1:1` `3:4` `9:16` | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "seedance-2.0",
  "input": {
    "prompt": "An octopus finds a football in the ocean and excitedly calls its octopus friends to come and play. Cut scene to an octopus football game under the sea.",
    "duration": "4",
    "resolution": "720p",
    "aspect_ratio": "auto",
    "generate_audio": true
  }
}
```
</details>

---

### `seedance-2.0-image` — Seedance 2.0 Image-to-Video (ByteDance) · `image-to-video`

ByteDance Seedance 2.0: animate images with cinematic quality and synchronized audio.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | Random seed for reproducibility. Note that results may still vary slightly even with the same seed. |
| `prompt` | `string` | ✓ | – | – | The text prompt describing the desired motion and action for the video. |
| `duration` | `string` |  | `4` | `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15` | Duration of the video in seconds (4-15). |
| `image_url` | `string` | ✓ | – | – | The URL of the starting frame image to animate. Supported formats: JPEG, PNG, WebP. Max 30 MB. |
| `resolution` | `string` |  | `720p` | `480p` `720p` `1080p` | Video resolution - 480p for faster generation, 720p for balance, 1080p for highest quality. |
| `end_user_id` | `string` |  | – | – | The unique user ID of the end user. |
| `aspect_ratio` | `string` |  | `auto` | `auto` `21:9` `16:9` `4:3` `1:1` `3:4` `9:16` | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to infer from the input image. |
| `end_image_url` | `string` |  | – | – | The URL of the image to use as the last frame of the video. When provided, the generated video will transition from the starting image to this ending image. Supported formats: JPEG, PNG, WebP. Max 30 MB. |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "seedance-2.0-image",
  "input": {
    "prompt": "An octopus finds a football in the ocean and excitedly calls its octopus friends to come and play. Cut scene to an octopus football game under the sea.",
    "duration": "4",
    "image_url": "https://example.com/sample-image.jpg",
    "resolution": "720p",
    "aspect_ratio": "auto",
    "generate_audio": true
  }
}
```
</details>

---

### `seedance-2.0-ref` — Seedance 2.0 Reference-to-Video (ByteDance) · `reference-to-video`

ByteDance Seedance 2.0: generate video from reference images, videos, and audio clips.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | Random seed for reproducibility. Note that results may still vary slightly even with the same seed. |
| `prompt` | `string` | ✓ | – | – | The text prompt used to generate the video. |
| `duration` | `string` |  | `4` | `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15` | Duration of the video in seconds (4-15). |
| `audio_urls` | `array` |  | – | – | Reference audio to guide video generation. Refer to them in the prompt as @Audio1, @Audio2, etc. Supported formats: MP3, WAV. Up to 3 files, combined duration must not exceed 15 seconds. Max 15 MB per file.If audio is provided, at least one reference image or video is required. |
| `image_urls` | `array` |  | – | – | Reference images to guide video generation. Refer to them in the prompt as @Image1, @Image2, etc. Supported formats: JPEG, PNG, WebP. Max 30 MB per image. Up to 9 images. Total files across all modalities must not exceed 12. |
| `resolution` | `string` |  | `720p` | `480p` `720p` `1080p` | Video resolution - 480p for faster generation, 720p for balance, 1080p for highest quality. |
| `video_urls` | `array` |  | – | – | Reference videos to guide video generation. Refer to them in the prompt as @Video1, @Video2, etc. Supported formats: MP4, MOV. Up to 3 videos, combined duration must be between 2 and 15 seconds, total size under 50 MB. Each video must be between ~480p (640x640) and ~720p (834x1112) in resolution. |
| `end_user_id` | `string` |  | – | – | The unique user ID of the end user. |
| `aspect_ratio` | `string` |  | `auto` | `auto` `21:9` `16:9` `4:3` `1:1` `3:4` `9:16` | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "seedance-2.0-ref",
  "input": {
    "prompt": "An octopus finds a football in the ocean and excitedly calls its octopus friends to come and play. Cut scene to an octopus football game under the sea.",
    "duration": "4",
    "image_urls": [
      "https://example.com/sample-image.jpg"
    ],
    "resolution": "720p",
    "aspect_ratio": "auto",
    "generate_audio": true
  }
}
```
</details>

---

### `seedance-2.0-fast` — Seedance 2.0 Fast (ByteDance) · `text-to-video`

ByteDance Seedance 2.0 fast tier: lower-latency text-to-video with native audio.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | Random seed for reproducibility. Note that results may still vary slightly even with the same seed. |
| `prompt` | `string` | ✓ | – | – | The text prompt used to generate the video |
| `duration` | `string` |  | `4` | `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15` | Duration of the video in seconds (4-15). |
| `resolution` | `string` |  | `720p` | `480p` `720p` | Video resolution - 480p for faster generation, 720p for balance. |
| `end_user_id` | `string` |  | – | – | The unique user ID of the end user. |
| `aspect_ratio` | `string` |  | `auto` | `auto` `21:9` `16:9` `4:3` `1:1` `3:4` `9:16` | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "seedance-2.0-fast",
  "input": {
    "prompt": "An octopus finds a football in the ocean and excitedly calls its octopus friends to come and play. Cut scene to an octopus football game under the sea.",
    "duration": "4",
    "resolution": "720p",
    "aspect_ratio": "auto",
    "generate_audio": true
  }
}
```
</details>

---

### `seedance-2.0-fast-image` — Seedance 2.0 Fast Image-to-Video (ByteDance) · `image-to-video`

ByteDance Seedance 2.0 fast tier: lower-latency image-to-video with synchronized audio.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | Random seed for reproducibility. Note that results may still vary slightly even with the same seed. |
| `prompt` | `string` | ✓ | – | – | The text prompt describing the desired motion and action for the video. |
| `duration` | `string` |  | `4` | `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15` | Duration of the video in seconds (4-15). |
| `image_url` | `string` | ✓ | – | – | The URL of the starting frame image to animate. Supported formats: JPEG, PNG, WebP. Max 30 MB. |
| `resolution` | `string` |  | `720p` | `480p` `720p` | Video resolution - 480p for faster generation, 720p for balance. |
| `end_user_id` | `string` |  | – | – | The unique user ID of the end user. |
| `aspect_ratio` | `string` |  | `auto` | `auto` `21:9` `16:9` `4:3` `1:1` `3:4` `9:16` | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to infer from the input image. |
| `end_image_url` | `string` |  | – | – | The URL of the image to use as the last frame of the video. When provided, the generated video will transition from the starting image to this ending image. Supported formats: JPEG, PNG, WebP. Max 30 MB. |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "seedance-2.0-fast-image",
  "input": {
    "prompt": "An octopus finds a football in the ocean and excitedly calls its octopus friends to come and play. Cut scene to an octopus football game under the sea.",
    "duration": "4",
    "image_url": "https://example.com/sample-image.jpg",
    "resolution": "720p",
    "aspect_ratio": "auto",
    "generate_audio": true
  }
}
```
</details>

---

### `seedance-2.0-fast-ref` — Seedance 2.0 Fast Reference-to-Video (ByteDance) · `reference-to-video`

ByteDance Seedance 2.0 fast tier: reference-to-video with lower latency and cost.

| Parameter | Type | Req | Default | Values / Range | Description |
|---|---|---|---|---|---|
| `seed` | `integer` |  | – | – | Random seed for reproducibility. Note that results may still vary slightly even with the same seed. |
| `prompt` | `string` | ✓ | – | – | The text prompt used to generate the video. |
| `duration` | `string` |  | `4` | `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15` | Duration of the video in seconds (4-15). |
| `audio_urls` | `array` |  | – | – | Reference audio to guide video generation. Refer to them in the prompt as @Audio1, @Audio2, etc. Supported formats: MP3, WAV. Up to 3 files, combined duration must not exceed 15 seconds. Max 15 MB per file.If audio is provided, at least one reference image or video is required. |
| `image_urls` | `array` |  | – | – | Reference images to guide video generation. Refer to them in the prompt as @Image1, @Image2, etc. Supported formats: JPEG, PNG, WebP. Max 30 MB per image. Up to 9 images. Total files across all modalities must not exceed 12. |
| `resolution` | `string` |  | `720p` | `480p` `720p` | Video resolution - 480p for faster generation, 720p for balance. |
| `video_urls` | `array` |  | – | – | Reference videos to guide video generation. Refer to them in the prompt as @Video1, @Video2, etc. Supported formats: MP4, MOV. Up to 3 videos, combined duration must be between 2 and 15 seconds, total size under 50 MB. Each video must be between ~480p (640x640) and ~720p (834x1112) in resolution. |
| `end_user_id` | `string` |  | – | – | The unique user ID of the end user. |
| `aspect_ratio` | `string` |  | `auto` | `auto` `21:9` `16:9` `4:3` `1:1` `3:4` `9:16` | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. |
| `generate_audio` | `boolean` |  | `true` | – | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not. |

<details>
<summary>Example request body</summary>

```json
{
  "model": "seedance-2.0-fast-ref",
  "input": {
    "prompt": "An octopus finds a football in the ocean and excitedly calls its octopus friends to come and play. Cut scene to an octopus football game under the sea.",
    "duration": "4",
    "image_urls": [
      "https://example.com/sample-image.jpg"
    ],
    "resolution": "720p",
    "aspect_ratio": "auto",
    "generate_audio": true
  }
}
```
</details>

---


## OpenAPI

````yaml POST /hub/v1/video
openapi: 3.0.0
info:
  title: Hub - Premium AI Gateway (Image / Video / Audio / Transcribe)
  description: >-
    Hub is a unified AI Gateway offering **flagship-quality, production-stable,
    and cheaper-than-official** access to the world's best AI models across
    image, video, audio (music) and transcription capabilities.


    **Why Hub?**

    - 🏆 **Flagship quality** — only official flagship model weights (Veo 3.1,
    Nano Banana Pro, GPT Image 2, Kling v3 Pro, WAN 2.7, Seedance 2.0,
    ElevenLabs Music). No knock-offs, no distillations — same outputs as going
    direct.

    - 🛡️ **Production stable** — multi-region routing, automatic failover,
    transparent retries on transient errors, queue-aware load balancing. Built
    for 24/7 production traffic.

    - 💰 **Cheaper than official** — pay only on `status=completed` (failed
    tasks are free), billed in unified credits at a meaningful discount versus
    going direct to the model provider. No per-provider minimums, no monthly
    subscriptions.


    Each endpoint accepts a `model` + `input` payload — switch models without
    changing the endpoint shape.
  version: 1.0.0
  contact: {}
servers:
  - url: https://api.mountsea.ai
    description: API Gateway
security: []
tags:
  - name: hub
    description: >-
      Model discovery — list and inspect schemas/examples for every available
      model across all capabilities.
  - name: Image
    description: >-
      Image generation & editing models (Nano Banana, GPT Image 2 and edit
      variants).
  - name: Video
    description: >-
      Video generation models — text-to-video, image-to-video, multi-reference,
      first-last frame and edit (Veo 3.1, Kling v3, WAN 2.7, Seedance 2.0).
  - name: Audio
    description: >-
      Audio capabilities — music generation (ElevenLabs Music) and audio/video
      transcription.
  - name: Tasks
    description: Poll the status / result of any submitted Hub task.
paths:
  /hub/v1/video:
    post:
      tags:
        - Video
      summary: Submit a video generation task
      description: >
        Generates video using the selected model. The generation mode
        (text-to-video / image-to-video / etc.) is determined entirely by the
        `model` you choose — no separate `mode` field is needed.


        **Text-to-video** — pass `prompt` only:

        `veo-3.1` · `veo-3.1-fast` · `veo-3.1-lite` · `veo-3` · `kling-v3-pro` ·
        `kling-v3-standard` · `wan-2.7` · `seedance-2.0`


        **Image-to-video** — pass `prompt` + `image_urls` (1 image):

        `veo-3.1-image` · `veo-3.1-fast-image` · `veo-3.1-lite-image` ·
        `veo-3-image` · `kling-v3-pro-image` · `kling-v3-standard-image` ·
        `wan-2.7-image` · `seedance-2.0-image`


        **Multi-reference video** — `image_urls` with 2–9 images:

        `veo-3.1-ref` · `veo-3.1-fast-ref`


        **First-last frame** — `image_urls[0]` = start frame, `image_urls[1]` =
        end frame:

        `veo-3.1-first-last`


        > Always pass image(s) via `image_urls: string[]`; the service maps them
        to the correct fal field automatically.


        ---


        **Workflow**

        1. `GET /hub/v1/models?capability=video` — browse available models

        2. `GET /hub/v1/models/:model` — copy the `example` as your `input`

        3. `POST /hub/v1/video` ← you are here

        4. `GET /hub/v1/tasks/:task_id` — poll until `ready=true`


        ---


        ## Model Reference


        > **Tip:** Click **Try it out** → select a model from the dropdown below
        → the parameter schema auto-populates with an example.


        ### `veo-3.1` — Veo 3.1 (Google) · `text-to-video`


        Google Veo 3.1 text-to-video with optional native audio.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | The seed for the random number
        generator. |

        | `prompt` | `string` | ✓ | – | – | The text prompt describing the video
        you want to generate |

        | `auto_fix` | `boolean` |  | `true` | – | Whether to automatically
        attempt to fix prompts that fail content policy or other validation
        checks by rewriting them. |

        | `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the
        generated video. |

        | `resolution` | `string` |  | `720p` | `720p` `1080p` `4k` | The
        resolution of the generated video. |

        | `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` | Aspect ratio
        of the generated video |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        audio for the video. |

        | `negative_prompt` | `string` |  | – | – | A negative prompt to guide
        the video generation. |

        | `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The
        safety tolerance level for content moderation. 1 is the most strict
        (blocks most content), 6 is the least strict. Note: API-only parameter.
        |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "veo-3.1",
          "input": {
            "prompt": "Two person street interview in New York City.\nSample Dialogue:\nHost: \"Did you hear the news?\"\nPerson: \"Yes! Veo 3.1 is now available online. If you want to see it, go check it out!\"",
            "auto_fix": true,
            "duration": "8s",
            "resolution": "720p",
            "aspect_ratio": "16:9",
            "generate_audio": true,
            "safety_tolerance": "4"
          }
        }

        ```

        </details>


        ---


        ### `veo-3.1-image` — Veo 3.1 Image-to-Video (Google) · `image-to-video`


        Veo 3.1: animate a single reference image.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | The seed for the random number
        generator. |

        | `prompt` | `string` | ✓ | – | – | The text prompt describing the video
        you want to generate |

        | `auto_fix` | `boolean` |  | – | – | Whether to automatically attempt
        to fix prompts that fail content policy or other validation checks by
        rewriting them. |

        | `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the
        generated video. |

        | `image_url` | `string` | ✓ | – | – | URL of the input image to
        animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect
        ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be
        cropped to fit. |

        | `resolution` | `string` |  | `720p` | `720p` `1080p` `4k` | The
        resolution of the generated video. |

        | `aspect_ratio` | `string` |  | `auto` | `auto` `16:9` `9:16` | The
        aspect ratio of the generated video. Only 16:9 and 9:16 are supported. |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        audio for the video. |

        | `negative_prompt` | `string` |  | – | – | A negative prompt to guide
        the video generation. |

        | `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The
        safety tolerance level for content moderation. 1 is the most strict
        (blocks most content), 6 is the least strict. Note: API-only parameter.
        |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "veo-3.1-image",
          "input": {
            "prompt": "A monkey and polar bear host a casual podcast about AI inference, bringing their unique perspectives from different environments (tropical vs. arctic) to discuss how AI systems make decisions and process information.\nSample Dialogue:\nMonkey (Banana): \"Welcome back to Bananas & Ice! I am Banana\"\nPolar Bear (Ice): \"And I'm Ice!\"",
            "duration": "8s",
            "image_url": "https://example.com/sample-image.jpg",
            "resolution": "720p",
            "aspect_ratio": "auto",
            "generate_audio": true,
            "safety_tolerance": "4"
          }
        }

        ```

        </details>


        ---


        ### `veo-3.1-ref` — Veo 3.1 Multi-Ref (Google) · `reference-to-video`


        Veo 3.1: generate video from reference images for consistent subject
        appearance.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `prompt` | `string` | ✓ | – | – | The text prompt describing the video
        you want to generate |

        | `auto_fix` | `boolean` |  | – | – | Whether to automatically attempt
        to fix prompts that fail content policy or other validation checks by
        rewriting them. |

        | `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the
        generated video. |

        | `image_urls` | `array` | ✓ | – | – | URLs of the reference images to
        use for consistent subject appearance |

        | `resolution` | `string` |  | `720p` | `720p` `1080p` `4k` | The
        resolution of the generated video. |

        | `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` | The aspect
        ratio of the generated video. |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        audio for the video. |

        | `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The
        safety tolerance level for content moderation. 1 is the most strict
        (blocks most content), 6 is the least strict. Note: API-only parameter.
        |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "veo-3.1-ref",
          "input": {
            "prompt": "A chimpanzee wearing overalls frolics in the grassy field, gently playing with the butterflies. In the background, a circus tent and carousel beckon.",
            "duration": "8s",
            "image_urls": [
              "https://example.com/sample-image.jpg",
              "https://example.com/sample-image-2.jpg",
              "https://example.com/sample-image-3.jpg"
            ],
            "resolution": "720p",
            "aspect_ratio": "16:9",
            "generate_audio": true,
            "safety_tolerance": "4"
          }
        }

        ```

        </details>


        ---


        ### `veo-3.1-first-last` — Veo 3.1 First-Last Frame (Google) ·
        `first-last-frame-to-video`


        Veo 3.1: generate a transition video between a start frame and an end
        frame.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | The seed for the random number
        generator. |

        | `prompt` | `string` | ✓ | – | – | The text prompt describing the video
        you want to generate |

        | `auto_fix` | `boolean` |  | – | – | Whether to automatically attempt
        to fix prompts that fail content policy or other validation checks by
        rewriting them. |

        | `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the
        generated video. |

        | `resolution` | `string` |  | `720p` | `720p` `1080p` `4k` | The
        resolution of the generated video. |

        | `aspect_ratio` | `string` |  | `auto` | `auto` `16:9` `9:16` | The
        aspect ratio of the generated video. |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        audio for the video. |

        | `last_frame_url` | `string` | ✓ | – | – | URL of the last frame of the
        video |

        | `first_frame_url` | `string` | ✓ | – | – | URL of the first frame of
        the video |

        | `negative_prompt` | `string` |  | – | – | A negative prompt to guide
        the video generation. |

        | `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The
        safety tolerance level for content moderation. 1 is the most strict
        (blocks most content), 6 is the least strict. Note: API-only parameter.
        |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "veo-3.1-first-last",
          "input": {
            "prompt": "A woman looks into the camera, breathes in, then exclaims energetically, \"have you guys checked out this AI video generation? It's incredible!\"",
            "duration": "8s",
            "resolution": "720p",
            "aspect_ratio": "auto",
            "generate_audio": true,
            "last_frame_url": "https://example.com/sample-image-2.jpg",
            "first_frame_url": "https://example.com/sample-image.jpg",
            "safety_tolerance": "4"
          }
        }

        ```

        </details>


        ---


        ### `veo-3.1-fast` — Veo 3.1 Fast (Google) · `text-to-video`


        Veo 3.1 Fast: lower-latency text-to-video at reduced cost.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | The seed for the random number
        generator. |

        | `prompt` | `string` | ✓ | – | – | The text prompt describing the video
        you want to generate |

        | `auto_fix` | `boolean` |  | `true` | – | Whether to automatically
        attempt to fix prompts that fail content policy or other validation
        checks by rewriting them. |

        | `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the
        generated video. |

        | `resolution` | `string` |  | `720p` | `720p` `1080p` `4k` | The
        resolution of the generated video. |

        | `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` | Aspect ratio
        of the generated video |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        audio for the video. |

        | `negative_prompt` | `string` |  | – | – | A negative prompt to guide
        the video generation. |

        | `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The
        safety tolerance level for content moderation. 1 is the most strict
        (blocks most content), 6 is the least strict. Note: API-only parameter.
        |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "veo-3.1-fast",
          "input": {
            "prompt": "Two person street interview in New York City.\nSample Dialogue:\nHost: \"Did you hear the news?\"\nPerson: \"Yes! Veo 3.1 is now available online. If you want to see it, go check it out!\"",
            "auto_fix": true,
            "duration": "8s",
            "resolution": "720p",
            "aspect_ratio": "16:9",
            "generate_audio": true,
            "safety_tolerance": "4"
          }
        }

        ```

        </details>


        ---


        ### `veo-3.1-fast-image` — Veo 3.1 Fast Image-to-Video (Google) ·
        `image-to-video`


        Veo 3.1 Fast: animate a reference image at lower cost.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | The seed for the random number
        generator. |

        | `prompt` | `string` | ✓ | – | – | The text prompt describing the video
        you want to generate |

        | `auto_fix` | `boolean` |  | – | – | Whether to automatically attempt
        to fix prompts that fail content policy or other validation checks by
        rewriting them. |

        | `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the
        generated video. |

        | `image_url` | `string` | ✓ | – | – | URL of the input image to
        animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect
        ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be
        cropped to fit. |

        | `resolution` | `string` |  | `720p` | `720p` `1080p` `4k` | The
        resolution of the generated video. |

        | `aspect_ratio` | `string` |  | `auto` | `auto` `16:9` `9:16` | The
        aspect ratio of the generated video. Only 16:9 and 9:16 are supported. |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        audio for the video. |

        | `negative_prompt` | `string` |  | – | – | A negative prompt to guide
        the video generation. |

        | `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The
        safety tolerance level for content moderation. 1 is the most strict
        (blocks most content), 6 is the least strict. Note: API-only parameter.
        |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "veo-3.1-fast-image",
          "input": {
            "prompt": "A monkey and polar bear host a casual podcast about AI inference, bringing their unique perspectives from different environments (tropical vs. arctic) to discuss how AI systems make decisions and process information.\nSample Dialogue:\nMonkey (Banana): \"Welcome back to Bananas & Ice! I am Banana\"\nPolar Bear (Ice): \"And I'm Ice!\"",
            "duration": "8s",
            "image_url": "https://example.com/sample-image.jpg",
            "resolution": "720p",
            "aspect_ratio": "auto",
            "generate_audio": true,
            "safety_tolerance": "4"
          }
        }

        ```

        </details>


        ---


        ### `veo-3.1-fast-ref` — Veo 3.1 Fast Multi-Ref (Google) ·
        `reference-to-video`


        Veo 3.1 Fast: multi-reference video at lower cost.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `prompt` | `string` | ✓ | – | – | The text prompt describing the video
        you want to generate |

        | `auto_fix` | `boolean` |  | – | – | Whether to automatically attempt
        to fix prompts that fail content policy or other validation checks by
        rewriting them. |

        | `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the
        generated video. |

        | `image_urls` | `array` | ✓ | – | – | URLs of the reference images to
        use for consistent subject appearance |

        | `resolution` | `string` |  | `720p` | `720p` `1080p` `4k` | The
        resolution of the generated video. |

        | `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` | The aspect
        ratio of the generated video. |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        audio for the video. |

        | `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The
        safety tolerance level for content moderation. 1 is the most strict
        (blocks most content), 6 is the least strict. Note: API-only parameter.
        |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "veo-3.1-fast-ref",
          "input": {
            "prompt": "A chimpanzee wearing overalls frolics in the grassy field, gently playing with the butterflies. In the background, a circus tent and carousel beckon.",
            "duration": "8s",
            "image_urls": [
              "https://example.com/sample-image.jpg",
              "https://example.com/sample-image-2.jpg",
              "https://example.com/sample-image-3.jpg"
            ],
            "resolution": "720p",
            "aspect_ratio": "16:9",
            "generate_audio": true,
            "safety_tolerance": "4"
          }
        }

        ```

        </details>


        ---


        ### `veo-3.1-lite` — Veo 3.1 Lite (Google) · `text-to-video`


        Veo 3.1 Lite: lowest cost text-to-video (720p/1080p only).


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | The seed for the random number
        generator. |

        | `prompt` | `string` | ✓ | – | – | The text prompt describing the video
        you want to generate |

        | `auto_fix` | `boolean` |  | `true` | – | Whether to automatically
        attempt to fix prompts that fail content policy or other validation
        checks by rewriting them. |

        | `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the
        generated video. |

        | `resolution` | `string` |  | `720p` | `720p` `1080p` | The resolution
        of the generated video. |

        | `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` | Aspect ratio
        of the generated video |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        audio for the video. |

        | `negative_prompt` | `string` |  | – | – | A negative prompt to guide
        the video generation. |

        | `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The
        safety tolerance level for content moderation. 1 is the most strict
        (blocks most content), 6 is the least strict. Note: API-only parameter.
        |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "veo-3.1-lite",
          "input": {
            "prompt": "A massive blue whale glides through crystal-clear deep ocean water, sunlight rays piercing through the surface above, bioluminescent plankton scattered around, cinematic slow motion",
            "auto_fix": true,
            "duration": "8s",
            "resolution": "720p",
            "aspect_ratio": "16:9",
            "generate_audio": true,
            "safety_tolerance": "4"
          }
        }

        ```

        </details>


        ---


        ### `veo-3.1-lite-image` — Veo 3.1 Lite Image-to-Video (Google) ·
        `image-to-video`


        Veo 3.1 Lite: animate a reference image at lowest cost.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | The seed for the random number
        generator. |

        | `prompt` | `string` | ✓ | – | – | The text prompt describing the video
        you want to generate |

        | `auto_fix` | `boolean` |  | – | – | Whether to automatically attempt
        to fix prompts that fail content policy or other validation checks by
        rewriting them. |

        | `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the
        generated video. |

        | `image_url` | `string` | ✓ | – | – | URL of the input image to
        animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect
        ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be
        cropped to fit. |

        | `resolution` | `string` |  | `720p` | `720p` `1080p` | The resolution
        of the generated video. |

        | `aspect_ratio` | `string` |  | `auto` | `auto` `16:9` `9:16` | The
        aspect ratio of the generated video. Only 16:9 and 9:16 are supported. |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        audio for the video. |

        | `negative_prompt` | `string` |  | – | – | A negative prompt to guide
        the video generation. |

        | `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The
        safety tolerance level for content moderation. 1 is the most strict
        (blocks most content), 6 is the least strict. Note: API-only parameter.
        |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "veo-3.1-lite-image",
          "input": {
            "prompt": "A massive blue whale glides through crystal-clear deep ocean water, sunlight rays piercing through the surface above, bioluminescent plankton scattered around, cinematic slow motion",
            "duration": "8s",
            "image_url": "https://example.com/sample-image.jpg",
            "resolution": "720p",
            "aspect_ratio": "auto",
            "generate_audio": true,
            "safety_tolerance": "4"
          }
        }

        ```

        </details>


        ---


        ### `veo-3` — Veo 3 (Google) · `text-to-video`


        Google Veo 3 text-to-video with native audio.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | The seed for the random number
        generator. |

        | `prompt` | `string` | ✓ | – | – | The text prompt describing the video
        you want to generate |

        | `auto_fix` | `boolean` |  | `true` | – | Whether to automatically
        attempt to fix prompts that fail content policy or other validation
        checks by rewriting them. |

        | `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the
        generated video. |

        | `resolution` | `string` |  | `720p` | `720p` `1080p` | The resolution
        of the generated video. |

        | `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` | The aspect
        ratio of the generated video. |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        audio for the video. |

        | `negative_prompt` | `string` |  | – | – | A negative prompt to guide
        the video generation. |

        | `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The
        safety tolerance level for content moderation. 1 is the most strict
        (blocks most content), 6 is the least strict. Note: API-only parameter.
        |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "veo-3",
          "input": {
            "prompt": "A casual street interview on a busy New York City sidewalk in the afternoon. The interviewer holds a plain, unbranded microphone and asks: Have you seen Google's new Veo3 model It is a super good model. Person replies: Yeah I saw it, it's already available now. It's crazy good.",
            "auto_fix": true,
            "duration": "8s",
            "resolution": "720p",
            "aspect_ratio": "16:9",
            "generate_audio": true,
            "safety_tolerance": "4"
          }
        }

        ```

        </details>


        ---


        ### `veo-3-image` — Veo 3 Image-to-Video (Google) · `image-to-video`


        Google Veo 3: animate a single reference image.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | The seed for the random number
        generator. |

        | `prompt` | `string` | ✓ | – | – | The text prompt describing how the
        image should be animated |

        | `auto_fix` | `boolean` |  | – | – | Whether to automatically attempt
        to fix prompts that fail content policy or other validation checks by
        rewriting them. |

        | `duration` | `string` |  | `8s` | `4s` `6s` `8s` | The duration of the
        generated video. |

        | `image_url` | `string` | ✓ | – | – | URL of the input image to
        animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect
        ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be
        cropped to fit. |

        | `resolution` | `string` |  | `720p` | `720p` `1080p` | The resolution
        of the generated video. |

        | `aspect_ratio` | `string` |  | `auto` | `auto` `16:9` `9:16` | The
        aspect ratio of the generated video. |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        audio for the video. |

        | `negative_prompt` | `string` |  | – | – | A negative prompt to guide
        the video generation. |

        | `safety_tolerance` | `string` |  | `4` | `1` `2` `3` `4` `5` `6` | The
        safety tolerance level for content moderation. 1 is the most strict
        (blocks most content), 6 is the least strict. Note: API-only parameter.
        |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "veo-3-image",
          "input": {
            "prompt": "A woman looks into the camera, breathes in, then exclaims energetically, \"have you guys checked out this AI video generation? It's incredible!\"",
            "duration": "8s",
            "image_url": "https://example.com/sample-image.jpg",
            "resolution": "720p",
            "aspect_ratio": "auto",
            "generate_audio": true,
            "safety_tolerance": "4"
          }
        }

        ```

        </details>


        ---


        ### `kling-v3-standard` — Kling v3 Standard (Kuaishou) · `text-to-video`


        Kling v3 Standard text-to-video with optional native audio.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `prompt` | `string` |  | – | – | Text prompt for video generation.
        Either prompt or multi_prompt must be provided, but not both. |

        | `duration` | `string` |  | `5` | `3` `4` `5` `6` `7` `8` `9` `10` `11`
        `12` `13` `14` `15` | The duration of the generated video in seconds |

        | `cfg_scale` | `number` |  | `0.5` | – | The CFG (Classifier Free
        Guidance) scale is a measure of how close you want the model to stick to
        your prompt. |

        | `shot_type` | `string` |  | `customize` | `customize` `intelligent` |
        The type of multi-shot video generation. 'intelligent' lets the model
        automatically determine shot structure. |

        | `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` `1:1` | The
        aspect ratio of the generated video frame |

        | `multi_prompt` | `array` |  | – | – | List of prompts for multi-shot
        video generation. If provided, overrides the single prompt and divides
        the video into multiple shots with specified prompts and durations. |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        native audio for the video. Supports Chinese and English voice output.
        Other languages are automatically translated to English. For English
        speech, use lowercase letters; for acronyms or proper nouns, use
        uppercase. |

        | `negative_prompt` | `string` |  | `blur, distort, and low quality` | –
        | – |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "kling-v3-standard",
          "input": {
            "prompt": "Cinematic drone shot flying through ancient stone ruins covered in moss and vines at golden hour. Camera starts low, rises through crumbling archways, revealing a vast misty valley beyond. Volumetric light rays pierce through gaps in the stone. Epic scale, photorealistic, 8K quality.",
            "duration": "5",
            "cfg_scale": 0.5,
            "shot_type": "customize",
            "aspect_ratio": "16:9",
            "multi_prompt": null,
            "generate_audio": true,
            "negative_prompt": "blur, distort, and low quality"
          }
        }

        ```

        </details>


        ---


        ### `kling-v3-standard-image` — Kling v3 Standard Image-to-Video
        (Kuaishou) · `image-to-video`


        Kling v3 Standard image-to-video (3-15 seconds).


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `prompt` | `string` |  | – | – | Text prompt for video generation.
        Either prompt or multi_prompt must be provided, but not both. |

        | `duration` | `string` |  | `5` | `3` `4` `5` `6` `7` `8` `9` `10` `11`
        `12` `13` `14` `15` | The duration of the generated video in seconds |

        | `elements` | `array` |  | – | – | Elements (characters/objects) to
        include in the video. Each element can either be an image set (frontal +
        reference images) or a video. Reference in prompt as @Element1,
        @Element2, etc. |

        | `cfg_scale` | `number` |  | `0.5` | – | The CFG (Classifier Free
        Guidance) scale is a measure of how close you want the model to stick to
        your prompt. |

        | `shot_type` | `string` |  | `customize` | `customize` `intelligent` |
        The type of multi-shot video generation. 'intelligent' lets the model
        automatically determine shot structure. |

        | `multi_prompt` | `array` |  | – | – | List of prompts for multi-shot
        video generation. If provided, divides the video into multiple shots. |

        | `end_image_url` | `string` |  | – | – | URL of the image to be used
        for the end of the video |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        native audio for the video. Supports Chinese and English voice output.
        Other languages are automatically translated to English. For English
        speech, use lowercase letters; for acronyms or proper nouns, use
        uppercase. |

        | `negative_prompt` | `string` |  | `blur, distort, and low quality` | –
        | – |

        | `start_image_url` | `string` | ✓ | – | – | URL of the image to be used
        for the video |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "kling-v3-standard-image",
          "input": {
            "prompt": "Camera slowly orbits around the vase. Soft light shifts across the ceramic surface. The pampas grass sways gently. Shadows move elegantly. Smooth continuous motion, premium feel.",
            "duration": "12",
            "cfg_scale": 0.5,
            "shot_type": "customize",
            "multi_prompt": null,
            "generate_audio": true,
            "negative_prompt": "blur, distort, and low quality",
            "start_image_url": "https://example.com/sample-image.jpg"
          }
        }

        ```

        </details>


        ---


        ### `kling-v3-pro` — Kling v3 Pro (Kuaishou) · `text-to-video`


        Kling v3 Pro text-to-video with optional native audio.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `prompt` | `string` |  | – | – | Text prompt for video generation.
        Either prompt or multi_prompt must be provided, but not both. |

        | `duration` | `string` |  | `5` | `3` `4` `5` `6` `7` `8` `9` `10` `11`
        `12` `13` `14` `15` | The duration of the generated video in seconds |

        | `cfg_scale` | `number` |  | `0.5` | – | The CFG (Classifier Free
        Guidance) scale is a measure of how close you want the model to stick to
        your prompt. |

        | `shot_type` | `string` |  | `customize` | `customize` `intelligent` |
        The type of multi-shot video generation. 'intelligent' lets the model
        automatically determine shot structure. |

        | `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` `1:1` | The
        aspect ratio of the generated video frame |

        | `multi_prompt` | `array` |  | – | – | List of prompts for multi-shot
        video generation. If provided, overrides the single prompt and divides
        the video into multiple shots with specified prompts and durations. |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        native audio for the video. Supports Chinese and English voice output.
        Other languages are automatically translated to English. For English
        speech, use lowercase letters; for acronyms or proper nouns, use
        uppercase. |

        | `negative_prompt` | `string` |  | `blur, distort, and low quality` | –
        | – |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "kling-v3-pro",
          "input": {
            "prompt": "Close-up of glowing fireflies dancing in a dark forest at twilight. Soft bioluminescent particles float through the air. Shallow depth of field, bokeh lights in background. Magical atmosphere, gentle movement.",
            "duration": "5",
            "cfg_scale": 0.5,
            "shot_type": "customize",
            "aspect_ratio": "16:9",
            "multi_prompt": null,
            "generate_audio": true,
            "negative_prompt": "blur, distort, and low quality"
          }
        }

        ```

        </details>


        ---


        ### `kling-v3-pro-image` — Kling v3 Pro Image-to-Video (Kuaishou) ·
        `image-to-video`


        Kling v3 Pro image-to-video (3-15 seconds).


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `prompt` | `string` |  | – | – | Text prompt for video generation.
        Either prompt or multi_prompt must be provided, but not both. |

        | `duration` | `string` |  | `5` | `3` `4` `5` `6` `7` `8` `9` `10` `11`
        `12` `13` `14` `15` | The duration of the generated video in seconds |

        | `elements` | `array` |  | – | – | Elements (characters/objects) to
        include in the video. Each element can either be an image set (frontal +
        reference images) or a video. Reference in prompt as @Element1,
        @Element2, etc. |

        | `cfg_scale` | `number` |  | `0.5` | – | The CFG (Classifier Free
        Guidance) scale is a measure of how close you want the model to stick to
        your prompt. |

        | `shot_type` | `string` |  | `customize` | `customize` `intelligent` |
        The type of multi-shot video generation. 'intelligent' lets the model
        automatically determine shot structure. |

        | `multi_prompt` | `array` |  | – | – | List of prompts for multi-shot
        video generation. If provided, divides the video into multiple shots. |

        | `end_image_url` | `string` |  | – | – | URL of the image to be used
        for the end of the video |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        native audio for the video. Supports Chinese and English voice output.
        Other languages are automatically translated to English. For English
        speech, use lowercase letters; for acronyms or proper nouns, use
        uppercase. |

        | `negative_prompt` | `string` |  | `blur, distort, and low quality` | –
        | – |

        | `start_image_url` | `string` | ✓ | – | – | URL of the image to be used
        for the video |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "kling-v3-pro-image",
          "input": {
            "prompt": "The craftsman slowly examines the bowl, turning it gently in his weathered hands. His eyes reflect years of wisdom. Subtle smile forms on his face. Dust particles drift in warm light. Breathing motion, blinking eyes.",
            "duration": "12",
            "cfg_scale": 0.5,
            "shot_type": "customize",
            "multi_prompt": null,
            "generate_audio": true,
            "negative_prompt": "blur, distort, and low quality",
            "start_image_url": "https://example.com/sample-image.jpg"
          }
        }

        ```

        </details>


        ---


        ### `wan-2.7` — WAN 2.7 (Alibaba) · `text-to-video`


        WAN 2.7 text-to-video - high quality generation. Default resolution is
        1080p.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | Random seed for reproducibility
        (0-2147483647). |

        | `prompt` | `string` | ✓ | – | – | Text prompt describing the desired
        video. Max 5000 characters. |

        | `duration` | `integer` |  | `5` | `2` `3` `4` `5` `6` `7` `8` `9` `10`
        `11` `12` `13` `14` `15` | Output video duration in seconds (2-15). |

        | `audio_url` | `string` |  | – | – | URL of driving audio. Supports WAV
        and MP3. Duration: 3-30s. Max 15 MB. If not provided, the model
        auto-generates matching background music. |

        | `resolution` | `string` |  | `1080p` | `720p` `1080p` | Output video
        resolution tier. |

        | `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` `1:1` `4:3`
        `3:4` | Aspect ratio of the generated video. |

        | `negative_prompt` | `string` |  | – | – | Content to avoid in the
        video. Max 500 characters. |

        | `enable_safety_checker` | `boolean` |  | `true` | – | Enable content
        moderation for input and output. |

        | `enable_prompt_expansion` | `boolean` |  | `true` | – | Enable
        intelligent prompt rewriting. |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "wan-2.7",
          "input": {
            "prompt": "A kitten running in a meadow, cinematic lighting, smooth camera movement.",
            "duration": 5,
            "resolution": "1080p",
            "aspect_ratio": "16:9",
            "negative_prompt": "low resolution, errors, worst quality, low quality",
            "enable_safety_checker": true,
            "enable_prompt_expansion": true
          }
        }

        ```

        </details>


        ---


        ### `wan-2.7-image` — WAN 2.7 Image-to-Video (Alibaba) ·
        `image-to-video`


        WAN 2.7 image-to-video (720p/1080p, duration 2-15s).


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | Random seed for reproducibility
        (0-2147483647). |

        | `prompt` | `string` |  | – | – | Text prompt describing the desired
        video. Max 5000 characters. |

        | `duration` | `integer` |  | `5` | `2` `3` `4` `5` `6` `7` `8` `9` `10`
        `11` `12` `13` `14` `15` | Output video duration in seconds (2-15). |

        | `audio_url` | `string` |  | – | – | URL of driving audio. Supports WAV
        and MP3. Duration: 2-30s. Max 15 MB. |

        | `image_url` | `string` |  | – | – | URL of the first frame image.
        Formats: JPEG, JPG, PNG, BMP, WEBP. Max 20 MB. |

        | `video_url` | `string` |  | – | – | URL of a video clip to continue
        from. Format: MP4, MOV. Duration: 2-10s. Max 100 MB. Cannot be combined
        with image_url. |

        | `resolution` | `string` |  | `1080p` | `720p` `1080p` | Output video
        resolution tier. |

        | `end_image_url` | `string` |  | – | – | URL of the last frame image
        for first-and-last-frame-to-video. Same constraints as image_url. |

        | `negative_prompt` | `string` |  | – | – | Content to avoid in the
        video. Max 500 characters. |

        | `enable_safety_checker` | `boolean` |  | `true` | – | Enable content
        moderation for input and output. |

        | `enable_prompt_expansion` | `boolean` |  | `true` | – | Enable
        intelligent prompt rewriting. |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "wan-2.7-image",
          "input": {
            "prompt": "The massive humpback whale glides slowly through the deep blue water. It turns gracefully, its huge pectoral fin sweeping through the water like a wing. Sunbeams penetrate from above, illuminating the whale's textured skin. Small fish scatter. Awe-inspiring scale and grace.",
            "duration": 5,
            "image_url": "https://example.com/sample-image.jpg",
            "resolution": "1080p",
            "negative_prompt": "low resolution, errors, worst quality, low quality, incomplete, extra fingers, bad proportions, blurry, distorted",
            "enable_safety_checker": true,
            "enable_prompt_expansion": true
          }
        }

        ```

        </details>


        ---


        ### `wan-2.7-ref` — WAN 2.7 Reference-to-Video (Alibaba) ·
        `reference-to-video`


        WAN 2.7 reference-to-video using character/object reference images and
        videos (duration 2-10s).


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | Random seed for reproducibility
        (0-2147483647). |

        | `prompt` | `string` | ✓ | – | – | Text prompt describing the desired
        video. Max 5000 characters. |

        | `duration` | `integer` |  | `5` | `2` `3` `4` `5` `6` `7` `8` `9` `10`
        | Output video duration in seconds (2-10). |

        | `resolution` | `string` |  | `1080p` | `720p` `1080p` | Output video
        resolution tier. |

        | `multi_shots` | `boolean` |  | `false` | – | When true, enables
        intelligent multi-shot segmentation. When false (default), generates a
        single continuous shot. |

        | `aspect_ratio` | `string` |  | `16:9` | `16:9` `9:16` `1:1` `4:3`
        `3:4` | Aspect ratio of the generated video. |

        | `negative_prompt` | `string` |  | – | – | Content to avoid in the
        video. Max 500 characters. |

        | `reference_image_urls` | `array` |  | – | – | Reference image URLs for
        character/object appearance. Pass multiple images for multi-subject
        generation. Max 20 MB each. |

        | `reference_video_urls` | `array` |  | – | – | Reference video URLs for
        character/object appearance and motion. Pass multiple videos for
        multi-subject generation. Max 100 MB each. Note: when video inputs are
        provided, billing includes the total input video duration plus the
        output duration. Your charged credits will be higher than the output
        duration alone. |

        | `enable_safety_checker` | `boolean` |  | `true` | – | Enable content
        moderation for input and output. |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "wan-2.7-ref",
          "input": {
            "prompt": "A person walking through a beautiful garden, cinematic style.",
            "duration": 5,
            "resolution": "1080p",
            "aspect_ratio": "16:9",
            "negative_prompt": "low resolution, errors, worst quality, low quality",
            "enable_safety_checker": true
          }
        }

        ```

        </details>


        ---


        ### `wan-2.7-edit` — WAN 2.7 Edit Video (Alibaba) · `video-to-video`


        WAN 2.7 video editing: instruction-based editing, reference-image-based
        editing and style transfer (input video 2-10s).


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | Random seed for reproducibility
        (0-2147483647). |

        | `prompt` | `string` | ✓ | – | – | Editing instruction or style
        transfer description. Describe what changes you want applied to the
        video. |

        | `duration` | `string` |  | `0` | `0` `2` `3` `4` `5` `6` `7` `8` `9`
        `10` | Output duration in seconds. '0' means match the input video's
        duration. When set to 2-10, the output is truncated to that length from
        the start. |

        | `video_url` | `string` | ✓ | – | – | URL of the input video to edit.
        Format: MP4, MOV. Duration: 2-10s. Max 100 MB. |

        | `resolution` | `string` |  | `1080p` | `720p` `1080p` | Output video
        resolution tier. |

        | `aspect_ratio` | `string` |  | – | `16:9` `9:16` `1:1` `4:3` `3:4` |
        Aspect ratio of the output video. If not provided, uses the input
        video's aspect ratio. |

        | `audio_setting` | `string` |  | `auto` | `auto` `origin` | Audio
        handling: 'auto' lets the model decide whether to regenerate audio;
        'origin' preserves the original audio from the input video. |

        | `reference_image_url` | `string` |  | – | – | Optional reference image
        URL for reference-based editing. When provided, the edit is guided by
        the visual style or content of this image. |

        | `enable_safety_checker` | `boolean` |  | `true` | – | Enable content
        moderation for input and output. |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "wan-2.7-edit",
          "input": {
            "prompt": "Transform the entire scene into a beautiful watercolor painting style. Soft brushstrokes, flowing paint washes, visible paper texture. Colors should bleed and blend naturally like wet watercolor on paper.",
            "video_url": "https://example.com/sample-video.mp4",
            "resolution": "1080p",
            "audio_setting": "auto",
            "enable_safety_checker": true
          }
        }

        ```

        </details>


        ---


        ### `seedance-2.0` — Seedance 2.0 (ByteDance) · `text-to-video`


        ByteDance Seedance 2.0: cinematic text-to-video with native audio,
        physics, and camera control.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | Random seed for reproducibility. Note
        that results may still vary slightly even with the same seed. |

        | `prompt` | `string` | ✓ | – | – | The text prompt used to generate the
        video |

        | `duration` | `string` |  | `4` | `4` `5` `6` `7` `8` `9` `10` `11`
        `12` `13` `14` `15` | Duration of the video in seconds (4-15). |

        | `resolution` | `string` |  | `720p` | `480p` `720p` `1080p` | Video
        resolution - 480p for faster generation, 720p for balance, 1080p for
        highest quality. |

        | `end_user_id` | `string` |  | – | – | The unique user ID of the end
        user. |

        | `aspect_ratio` | `string` |  | `auto` | `auto` `21:9` `16:9` `4:3`
        `1:1` `3:4` `9:16` | The aspect ratio of the generated video. Use 16:9
        for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for
        ultrawide cinematic, or auto to let the model decide. |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        synchronized audio for the video, including sound effects, ambient
        sounds, and lip-synced speech. The cost of video generation is the same
        regardless of whether audio is generated or not. |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "seedance-2.0",
          "input": {
            "prompt": "An octopus finds a football in the ocean and excitedly calls its octopus friends to come and play. Cut scene to an octopus football game under the sea.",
            "duration": "4",
            "resolution": "720p",
            "aspect_ratio": "auto",
            "generate_audio": true
          }
        }

        ```

        </details>


        ---


        ### `seedance-2.0-image` — Seedance 2.0 Image-to-Video (ByteDance) ·
        `image-to-video`


        ByteDance Seedance 2.0: animate images with cinematic quality and
        synchronized audio.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | Random seed for reproducibility. Note
        that results may still vary slightly even with the same seed. |

        | `prompt` | `string` | ✓ | – | – | The text prompt describing the
        desired motion and action for the video. |

        | `duration` | `string` |  | `4` | `4` `5` `6` `7` `8` `9` `10` `11`
        `12` `13` `14` `15` | Duration of the video in seconds (4-15). |

        | `image_url` | `string` | ✓ | – | – | The URL of the starting frame
        image to animate. Supported formats: JPEG, PNG, WebP. Max 30 MB. |

        | `resolution` | `string` |  | `720p` | `480p` `720p` `1080p` | Video
        resolution - 480p for faster generation, 720p for balance, 1080p for
        highest quality. |

        | `end_user_id` | `string` |  | – | – | The unique user ID of the end
        user. |

        | `aspect_ratio` | `string` |  | `auto` | `auto` `21:9` `16:9` `4:3`
        `1:1` `3:4` `9:16` | The aspect ratio of the generated video. Use 16:9
        for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for
        ultrawide cinematic, or auto to infer from the input image. |

        | `end_image_url` | `string` |  | – | – | The URL of the image to use as
        the last frame of the video. When provided, the generated video will
        transition from the starting image to this ending image. Supported
        formats: JPEG, PNG, WebP. Max 30 MB. |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        synchronized audio for the video, including sound effects, ambient
        sounds, and lip-synced speech. The cost of video generation is the same
        regardless of whether audio is generated or not. |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "seedance-2.0-image",
          "input": {
            "prompt": "An octopus finds a football in the ocean and excitedly calls its octopus friends to come and play. Cut scene to an octopus football game under the sea.",
            "duration": "4",
            "image_url": "https://example.com/sample-image.jpg",
            "resolution": "720p",
            "aspect_ratio": "auto",
            "generate_audio": true
          }
        }

        ```

        </details>


        ---


        ### `seedance-2.0-ref` — Seedance 2.0 Reference-to-Video (ByteDance) ·
        `reference-to-video`


        ByteDance Seedance 2.0: generate video from reference images, videos,
        and audio clips.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | Random seed for reproducibility. Note
        that results may still vary slightly even with the same seed. |

        | `prompt` | `string` | ✓ | – | – | The text prompt used to generate the
        video. |

        | `duration` | `string` |  | `4` | `4` `5` `6` `7` `8` `9` `10` `11`
        `12` `13` `14` `15` | Duration of the video in seconds (4-15). |

        | `audio_urls` | `array` |  | – | – | Reference audio to guide video
        generation. Refer to them in the prompt as @Audio1, @Audio2, etc.
        Supported formats: MP3, WAV. Up to 3 files, combined duration must not
        exceed 15 seconds. Max 15 MB per file.If audio is provided, at least one
        reference image or video is required. |

        | `image_urls` | `array` |  | – | – | Reference images to guide video
        generation. Refer to them in the prompt as @Image1, @Image2, etc.
        Supported formats: JPEG, PNG, WebP. Max 30 MB per image. Up to 9 images.
        Total files across all modalities must not exceed 12. |

        | `resolution` | `string` |  | `720p` | `480p` `720p` `1080p` | Video
        resolution - 480p for faster generation, 720p for balance, 1080p for
        highest quality. |

        | `video_urls` | `array` |  | – | – | Reference videos to guide video
        generation. Refer to them in the prompt as @Video1, @Video2, etc.
        Supported formats: MP4, MOV. Up to 3 videos, combined duration must be
        between 2 and 15 seconds, total size under 50 MB. Each video must be
        between ~480p (640x640) and ~720p (834x1112) in resolution. |

        | `end_user_id` | `string` |  | – | – | The unique user ID of the end
        user. |

        | `aspect_ratio` | `string` |  | `auto` | `auto` `21:9` `16:9` `4:3`
        `1:1` `3:4` `9:16` | The aspect ratio of the generated video. Use 16:9
        for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for
        ultrawide cinematic, or auto to let the model decide. |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        synchronized audio for the video, including sound effects, ambient
        sounds, and lip-synced speech. The cost of video generation is the same
        regardless of whether audio is generated or not. |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "seedance-2.0-ref",
          "input": {
            "prompt": "An octopus finds a football in the ocean and excitedly calls its octopus friends to come and play. Cut scene to an octopus football game under the sea.",
            "duration": "4",
            "image_urls": [
              "https://example.com/sample-image.jpg"
            ],
            "resolution": "720p",
            "aspect_ratio": "auto",
            "generate_audio": true
          }
        }

        ```

        </details>


        ---


        ### `seedance-2.0-fast` — Seedance 2.0 Fast (ByteDance) ·
        `text-to-video`


        ByteDance Seedance 2.0 fast tier: lower-latency text-to-video with
        native audio.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | Random seed for reproducibility. Note
        that results may still vary slightly even with the same seed. |

        | `prompt` | `string` | ✓ | – | – | The text prompt used to generate the
        video |

        | `duration` | `string` |  | `4` | `4` `5` `6` `7` `8` `9` `10` `11`
        `12` `13` `14` `15` | Duration of the video in seconds (4-15). |

        | `resolution` | `string` |  | `720p` | `480p` `720p` | Video resolution
        - 480p for faster generation, 720p for balance. |

        | `end_user_id` | `string` |  | – | – | The unique user ID of the end
        user. |

        | `aspect_ratio` | `string` |  | `auto` | `auto` `21:9` `16:9` `4:3`
        `1:1` `3:4` `9:16` | The aspect ratio of the generated video. Use 16:9
        for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for
        ultrawide cinematic, or auto to let the model decide. |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        synchronized audio for the video, including sound effects, ambient
        sounds, and lip-synced speech. The cost of video generation is the same
        regardless of whether audio is generated or not. |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "seedance-2.0-fast",
          "input": {
            "prompt": "An octopus finds a football in the ocean and excitedly calls its octopus friends to come and play. Cut scene to an octopus football game under the sea.",
            "duration": "4",
            "resolution": "720p",
            "aspect_ratio": "auto",
            "generate_audio": true
          }
        }

        ```

        </details>


        ---


        ### `seedance-2.0-fast-image` — Seedance 2.0 Fast Image-to-Video
        (ByteDance) · `image-to-video`


        ByteDance Seedance 2.0 fast tier: lower-latency image-to-video with
        synchronized audio.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | Random seed for reproducibility. Note
        that results may still vary slightly even with the same seed. |

        | `prompt` | `string` | ✓ | – | – | The text prompt describing the
        desired motion and action for the video. |

        | `duration` | `string` |  | `4` | `4` `5` `6` `7` `8` `9` `10` `11`
        `12` `13` `14` `15` | Duration of the video in seconds (4-15). |

        | `image_url` | `string` | ✓ | – | – | The URL of the starting frame
        image to animate. Supported formats: JPEG, PNG, WebP. Max 30 MB. |

        | `resolution` | `string` |  | `720p` | `480p` `720p` | Video resolution
        - 480p for faster generation, 720p for balance. |

        | `end_user_id` | `string` |  | – | – | The unique user ID of the end
        user. |

        | `aspect_ratio` | `string` |  | `auto` | `auto` `21:9` `16:9` `4:3`
        `1:1` `3:4` `9:16` | The aspect ratio of the generated video. Use 16:9
        for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for
        ultrawide cinematic, or auto to infer from the input image. |

        | `end_image_url` | `string` |  | – | – | The URL of the image to use as
        the last frame of the video. When provided, the generated video will
        transition from the starting image to this ending image. Supported
        formats: JPEG, PNG, WebP. Max 30 MB. |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        synchronized audio for the video, including sound effects, ambient
        sounds, and lip-synced speech. The cost of video generation is the same
        regardless of whether audio is generated or not. |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "seedance-2.0-fast-image",
          "input": {
            "prompt": "An octopus finds a football in the ocean and excitedly calls its octopus friends to come and play. Cut scene to an octopus football game under the sea.",
            "duration": "4",
            "image_url": "https://example.com/sample-image.jpg",
            "resolution": "720p",
            "aspect_ratio": "auto",
            "generate_audio": true
          }
        }

        ```

        </details>


        ---


        ### `seedance-2.0-fast-ref` — Seedance 2.0 Fast Reference-to-Video
        (ByteDance) · `reference-to-video`


        ByteDance Seedance 2.0 fast tier: reference-to-video with lower latency
        and cost.


        | Parameter | Type | Req | Default | Values / Range | Description |

        |---|---|---|---|---|---|

        | `seed` | `integer` |  | – | – | Random seed for reproducibility. Note
        that results may still vary slightly even with the same seed. |

        | `prompt` | `string` | ✓ | – | – | The text prompt used to generate the
        video. |

        | `duration` | `string` |  | `4` | `4` `5` `6` `7` `8` `9` `10` `11`
        `12` `13` `14` `15` | Duration of the video in seconds (4-15). |

        | `audio_urls` | `array` |  | – | – | Reference audio to guide video
        generation. Refer to them in the prompt as @Audio1, @Audio2, etc.
        Supported formats: MP3, WAV. Up to 3 files, combined duration must not
        exceed 15 seconds. Max 15 MB per file.If audio is provided, at least one
        reference image or video is required. |

        | `image_urls` | `array` |  | – | – | Reference images to guide video
        generation. Refer to them in the prompt as @Image1, @Image2, etc.
        Supported formats: JPEG, PNG, WebP. Max 30 MB per image. Up to 9 images.
        Total files across all modalities must not exceed 12. |

        | `resolution` | `string` |  | `720p` | `480p` `720p` | Video resolution
        - 480p for faster generation, 720p for balance. |

        | `video_urls` | `array` |  | – | – | Reference videos to guide video
        generation. Refer to them in the prompt as @Video1, @Video2, etc.
        Supported formats: MP4, MOV. Up to 3 videos, combined duration must be
        between 2 and 15 seconds, total size under 50 MB. Each video must be
        between ~480p (640x640) and ~720p (834x1112) in resolution. |

        | `end_user_id` | `string` |  | – | – | The unique user ID of the end
        user. |

        | `aspect_ratio` | `string` |  | `auto` | `auto` `21:9` `16:9` `4:3`
        `1:1` `3:4` `9:16` | The aspect ratio of the generated video. Use 16:9
        for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for
        ultrawide cinematic, or auto to let the model decide. |

        | `generate_audio` | `boolean` |  | `true` | – | Whether to generate
        synchronized audio for the video, including sound effects, ambient
        sounds, and lip-synced speech. The cost of video generation is the same
        regardless of whether audio is generated or not. |


        <details>

        <summary>Example request body</summary>


        ```json

        {
          "model": "seedance-2.0-fast-ref",
          "input": {
            "prompt": "An octopus finds a football in the ocean and excitedly calls its octopus friends to come and play. Cut scene to an octopus football game under the sea.",
            "duration": "4",
            "image_urls": [
              "https://example.com/sample-image.jpg"
            ],
            "resolution": "720p",
            "aspect_ratio": "auto",
            "generate_audio": true
          }
        }

        ```

        </details>


        ---
      operationId: HubVideoPublicController_submitVideo
      parameters: []
      requestBody:
        required: true
        content:
          application/json:
            schema:
              oneOf:
                - $ref: '#/components/schemas/Veo31Request'
                - $ref: '#/components/schemas/Veo31ImageRequest'
                - $ref: '#/components/schemas/Veo31RefRequest'
                - $ref: '#/components/schemas/Veo31FirstLastRequest'
                - $ref: '#/components/schemas/Veo31FastRequest'
                - $ref: '#/components/schemas/Veo31FastImageRequest'
                - $ref: '#/components/schemas/Veo31FastRefRequest'
                - $ref: '#/components/schemas/Veo31LiteRequest'
                - $ref: '#/components/schemas/Veo31LiteImageRequest'
                - $ref: '#/components/schemas/Veo3Request'
                - $ref: '#/components/schemas/Veo3ImageRequest'
                - $ref: '#/components/schemas/KlingV3StandardRequest'
                - $ref: '#/components/schemas/KlingV3StandardImageRequest'
                - $ref: '#/components/schemas/KlingV3ProRequest'
                - $ref: '#/components/schemas/KlingV3ProImageRequest'
                - $ref: '#/components/schemas/Wan27Request'
                - $ref: '#/components/schemas/Wan27ImageRequest'
                - $ref: '#/components/schemas/Wan27RefRequest'
                - $ref: '#/components/schemas/Wan27EditRequest'
                - $ref: '#/components/schemas/Seedance20Request'
                - $ref: '#/components/schemas/Seedance20ImageRequest'
                - $ref: '#/components/schemas/Seedance20RefRequest'
                - $ref: '#/components/schemas/Seedance20FastRequest'
                - $ref: '#/components/schemas/Seedance20FastImageRequest'
                - $ref: '#/components/schemas/Seedance20FastRefRequest'
              discriminator:
                propertyName: model
                mapping:
                  veo-3.1:
                    $ref: '#/components/schemas/Veo31Request'
                  veo-3.1-image:
                    $ref: '#/components/schemas/Veo31ImageRequest'
                  veo-3.1-ref:
                    $ref: '#/components/schemas/Veo31RefRequest'
                  veo-3.1-first-last:
                    $ref: '#/components/schemas/Veo31FirstLastRequest'
                  veo-3.1-fast:
                    $ref: '#/components/schemas/Veo31FastRequest'
                  veo-3.1-fast-image:
                    $ref: '#/components/schemas/Veo31FastImageRequest'
                  veo-3.1-fast-ref:
                    $ref: '#/components/schemas/Veo31FastRefRequest'
                  veo-3.1-lite:
                    $ref: '#/components/schemas/Veo31LiteRequest'
                  veo-3.1-lite-image:
                    $ref: '#/components/schemas/Veo31LiteImageRequest'
                  veo-3:
                    $ref: '#/components/schemas/Veo3Request'
                  veo-3-image:
                    $ref: '#/components/schemas/Veo3ImageRequest'
                  kling-v3-standard:
                    $ref: '#/components/schemas/KlingV3StandardRequest'
                  kling-v3-standard-image:
                    $ref: '#/components/schemas/KlingV3StandardImageRequest'
                  kling-v3-pro:
                    $ref: '#/components/schemas/KlingV3ProRequest'
                  kling-v3-pro-image:
                    $ref: '#/components/schemas/KlingV3ProImageRequest'
                  wan-2.7:
                    $ref: '#/components/schemas/Wan27Request'
                  wan-2.7-image:
                    $ref: '#/components/schemas/Wan27ImageRequest'
                  wan-2.7-ref:
                    $ref: '#/components/schemas/Wan27RefRequest'
                  wan-2.7-edit:
                    $ref: '#/components/schemas/Wan27EditRequest'
                  seedance-2.0:
                    $ref: '#/components/schemas/Seedance20Request'
                  seedance-2.0-image:
                    $ref: '#/components/schemas/Seedance20ImageRequest'
                  seedance-2.0-ref:
                    $ref: '#/components/schemas/Seedance20RefRequest'
                  seedance-2.0-fast:
                    $ref: '#/components/schemas/Seedance20FastRequest'
                  seedance-2.0-fast-image:
                    $ref: '#/components/schemas/Seedance20FastImageRequest'
                  seedance-2.0-fast-ref:
                    $ref: '#/components/schemas/Seedance20FastRefRequest'
      responses:
        '200':
          description: ''
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/SubmitResponseDto'
      security:
        - bearerAuth: []
components:
  schemas:
    Veo31Request:
      type: object
      title: Veo 3.1 (Google) [text-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - veo-3.1
          description: 'Fixed value: `"veo-3.1"`'
        input:
          $ref: '#/components/schemas/Veo31Input'
      example:
        model: veo-3.1
        input:
          prompt: >-
            Two person street interview in New York City.

            Sample Dialogue:

            Host: "Did you hear the news?"

            Person: "Yes! Veo 3.1 is now available online. If you want to see
            it, go check it out!"
          auto_fix: true
          duration: 8s
          resolution: 720p
          aspect_ratio: '16:9'
          generate_audio: true
          safety_tolerance: '4'
    Veo31ImageRequest:
      type: object
      title: Veo 3.1 Image-to-Video (Google) [image-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - veo-3.1-image
          description: 'Fixed value: `"veo-3.1-image"`'
        input:
          $ref: '#/components/schemas/Veo31ImageInput'
      example:
        model: veo-3.1-image
        input:
          prompt: >-
            A monkey and polar bear host a casual podcast about AI inference,
            bringing their unique perspectives from different environments
            (tropical vs. arctic) to discuss how AI systems make decisions and
            process information.

            Sample Dialogue:

            Monkey (Banana): "Welcome back to Bananas & Ice! I am Banana"

            Polar Bear (Ice): "And I'm Ice!"
          duration: 8s
          image_url: https://example.com/sample-image.jpg
          resolution: 720p
          aspect_ratio: auto
          generate_audio: true
          safety_tolerance: '4'
    Veo31RefRequest:
      type: object
      title: Veo 3.1 Multi-Ref (Google) [reference-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - veo-3.1-ref
          description: 'Fixed value: `"veo-3.1-ref"`'
        input:
          $ref: '#/components/schemas/Veo31RefInput'
      example:
        model: veo-3.1-ref
        input:
          prompt: >-
            A chimpanzee wearing overalls frolics in the grassy field, gently
            playing with the butterflies. In the background, a circus tent and
            carousel beckon.
          duration: 8s
          image_urls:
            - https://example.com/sample-image.jpg
            - https://example.com/sample-image-2.jpg
            - https://example.com/sample-image-3.jpg
          resolution: 720p
          aspect_ratio: '16:9'
          generate_audio: true
          safety_tolerance: '4'
    Veo31FirstLastRequest:
      type: object
      title: Veo 3.1 First-Last Frame (Google) [first-last-frame-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - veo-3.1-first-last
          description: 'Fixed value: `"veo-3.1-first-last"`'
        input:
          $ref: '#/components/schemas/Veo31FirstLastInput'
      example:
        model: veo-3.1-first-last
        input:
          prompt: >-
            A woman looks into the camera, breathes in, then exclaims
            energetically, "have you guys checked out this AI video generation?
            It's incredible!"
          duration: 8s
          resolution: 720p
          aspect_ratio: auto
          generate_audio: true
          last_frame_url: https://example.com/sample-image-2.jpg
          first_frame_url: https://example.com/sample-image.jpg
          safety_tolerance: '4'
    Veo31FastRequest:
      type: object
      title: Veo 3.1 Fast (Google) [text-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - veo-3.1-fast
          description: 'Fixed value: `"veo-3.1-fast"`'
        input:
          $ref: '#/components/schemas/Veo31FastInput'
      example:
        model: veo-3.1-fast
        input:
          prompt: >-
            Two person street interview in New York City.

            Sample Dialogue:

            Host: "Did you hear the news?"

            Person: "Yes! Veo 3.1 is now available online. If you want to see
            it, go check it out!"
          auto_fix: true
          duration: 8s
          resolution: 720p
          aspect_ratio: '16:9'
          generate_audio: true
          safety_tolerance: '4'
    Veo31FastImageRequest:
      type: object
      title: Veo 3.1 Fast Image-to-Video (Google) [image-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - veo-3.1-fast-image
          description: 'Fixed value: `"veo-3.1-fast-image"`'
        input:
          $ref: '#/components/schemas/Veo31FastImageInput'
      example:
        model: veo-3.1-fast-image
        input:
          prompt: >-
            A monkey and polar bear host a casual podcast about AI inference,
            bringing their unique perspectives from different environments
            (tropical vs. arctic) to discuss how AI systems make decisions and
            process information.

            Sample Dialogue:

            Monkey (Banana): "Welcome back to Bananas & Ice! I am Banana"

            Polar Bear (Ice): "And I'm Ice!"
          duration: 8s
          image_url: https://example.com/sample-image.jpg
          resolution: 720p
          aspect_ratio: auto
          generate_audio: true
          safety_tolerance: '4'
    Veo31FastRefRequest:
      type: object
      title: Veo 3.1 Fast Multi-Ref (Google) [reference-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - veo-3.1-fast-ref
          description: 'Fixed value: `"veo-3.1-fast-ref"`'
        input:
          $ref: '#/components/schemas/Veo31FastRefInput'
      example:
        model: veo-3.1-fast-ref
        input:
          prompt: >-
            A chimpanzee wearing overalls frolics in the grassy field, gently
            playing with the butterflies. In the background, a circus tent and
            carousel beckon.
          duration: 8s
          image_urls:
            - https://example.com/sample-image.jpg
            - https://example.com/sample-image-2.jpg
            - https://example.com/sample-image-3.jpg
          resolution: 720p
          aspect_ratio: '16:9'
          generate_audio: true
          safety_tolerance: '4'
    Veo31LiteRequest:
      type: object
      title: Veo 3.1 Lite (Google) [text-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - veo-3.1-lite
          description: 'Fixed value: `"veo-3.1-lite"`'
        input:
          $ref: '#/components/schemas/Veo31LiteInput'
      example:
        model: veo-3.1-lite
        input:
          prompt: >-
            A massive blue whale glides through crystal-clear deep ocean water,
            sunlight rays piercing through the surface above, bioluminescent
            plankton scattered around, cinematic slow motion
          auto_fix: true
          duration: 8s
          resolution: 720p
          aspect_ratio: '16:9'
          generate_audio: true
          safety_tolerance: '4'
    Veo31LiteImageRequest:
      type: object
      title: Veo 3.1 Lite Image-to-Video (Google) [image-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - veo-3.1-lite-image
          description: 'Fixed value: `"veo-3.1-lite-image"`'
        input:
          $ref: '#/components/schemas/Veo31LiteImageInput'
      example:
        model: veo-3.1-lite-image
        input:
          prompt: >-
            A massive blue whale glides through crystal-clear deep ocean water,
            sunlight rays piercing through the surface above, bioluminescent
            plankton scattered around, cinematic slow motion
          duration: 8s
          image_url: https://example.com/sample-image.jpg
          resolution: 720p
          aspect_ratio: auto
          generate_audio: true
          safety_tolerance: '4'
    Veo3Request:
      type: object
      title: Veo 3 (Google) [text-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - veo-3
          description: 'Fixed value: `"veo-3"`'
        input:
          $ref: '#/components/schemas/Veo3Input'
      example:
        model: veo-3
        input:
          prompt: >-
            A casual street interview on a busy New York City sidewalk in the
            afternoon. The interviewer holds a plain, unbranded microphone and
            asks: Have you seen Google's new Veo3 model It is a super good
            model. Person replies: Yeah I saw it, it's already available now.
            It's crazy good.
          auto_fix: true
          duration: 8s
          resolution: 720p
          aspect_ratio: '16:9'
          generate_audio: true
          safety_tolerance: '4'
    Veo3ImageRequest:
      type: object
      title: Veo 3 Image-to-Video (Google) [image-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - veo-3-image
          description: 'Fixed value: `"veo-3-image"`'
        input:
          $ref: '#/components/schemas/Veo3ImageInput'
      example:
        model: veo-3-image
        input:
          prompt: >-
            A woman looks into the camera, breathes in, then exclaims
            energetically, "have you guys checked out this AI video generation?
            It's incredible!"
          duration: 8s
          image_url: https://example.com/sample-image.jpg
          resolution: 720p
          aspect_ratio: auto
          generate_audio: true
          safety_tolerance: '4'
    KlingV3StandardRequest:
      type: object
      title: Kling v3 Standard (Kuaishou) [text-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - kling-v3-standard
          description: 'Fixed value: `"kling-v3-standard"`'
        input:
          $ref: '#/components/schemas/KlingV3StandardInput'
      example:
        model: kling-v3-standard
        input:
          prompt: >-
            Cinematic drone shot flying through ancient stone ruins covered in
            moss and vines at golden hour. Camera starts low, rises through
            crumbling archways, revealing a vast misty valley beyond. Volumetric
            light rays pierce through gaps in the stone. Epic scale,
            photorealistic, 8K quality.
          duration: '5'
          cfg_scale: 0.5
          shot_type: customize
          aspect_ratio: '16:9'
          multi_prompt: null
          generate_audio: true
          negative_prompt: blur, distort, and low quality
    KlingV3StandardImageRequest:
      type: object
      title: Kling v3 Standard Image-to-Video (Kuaishou) [image-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - kling-v3-standard-image
          description: 'Fixed value: `"kling-v3-standard-image"`'
        input:
          $ref: '#/components/schemas/KlingV3StandardImageInput'
      example:
        model: kling-v3-standard-image
        input:
          prompt: >-
            Camera slowly orbits around the vase. Soft light shifts across the
            ceramic surface. The pampas grass sways gently. Shadows move
            elegantly. Smooth continuous motion, premium feel.
          duration: '12'
          cfg_scale: 0.5
          shot_type: customize
          multi_prompt: null
          generate_audio: true
          negative_prompt: blur, distort, and low quality
          start_image_url: https://example.com/sample-image.jpg
    KlingV3ProRequest:
      type: object
      title: Kling v3 Pro (Kuaishou) [text-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - kling-v3-pro
          description: 'Fixed value: `"kling-v3-pro"`'
        input:
          $ref: '#/components/schemas/KlingV3ProInput'
      example:
        model: kling-v3-pro
        input:
          prompt: >-
            Close-up of glowing fireflies dancing in a dark forest at twilight.
            Soft bioluminescent particles float through the air. Shallow depth
            of field, bokeh lights in background. Magical atmosphere, gentle
            movement.
          duration: '5'
          cfg_scale: 0.5
          shot_type: customize
          aspect_ratio: '16:9'
          multi_prompt: null
          generate_audio: true
          negative_prompt: blur, distort, and low quality
    KlingV3ProImageRequest:
      type: object
      title: Kling v3 Pro Image-to-Video (Kuaishou) [image-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - kling-v3-pro-image
          description: 'Fixed value: `"kling-v3-pro-image"`'
        input:
          $ref: '#/components/schemas/KlingV3ProImageInput'
      example:
        model: kling-v3-pro-image
        input:
          prompt: >-
            The craftsman slowly examines the bowl, turning it gently in his
            weathered hands. His eyes reflect years of wisdom. Subtle smile
            forms on his face. Dust particles drift in warm light. Breathing
            motion, blinking eyes.
          duration: '12'
          cfg_scale: 0.5
          shot_type: customize
          multi_prompt: null
          generate_audio: true
          negative_prompt: blur, distort, and low quality
          start_image_url: https://example.com/sample-image.jpg
    Wan27Request:
      type: object
      title: WAN 2.7 (Alibaba) [text-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - wan-2.7
          description: 'Fixed value: `"wan-2.7"`'
        input:
          $ref: '#/components/schemas/Wan27Input'
      example:
        model: wan-2.7
        input:
          prompt: >-
            A kitten running in a meadow, cinematic lighting, smooth camera
            movement.
          duration: 5
          resolution: 1080p
          aspect_ratio: '16:9'
          negative_prompt: low resolution, errors, worst quality, low quality
          enable_safety_checker: true
          enable_prompt_expansion: true
    Wan27ImageRequest:
      type: object
      title: WAN 2.7 Image-to-Video (Alibaba) [image-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - wan-2.7-image
          description: 'Fixed value: `"wan-2.7-image"`'
        input:
          $ref: '#/components/schemas/Wan27ImageInput'
      example:
        model: wan-2.7-image
        input:
          prompt: >-
            The massive humpback whale glides slowly through the deep blue
            water. It turns gracefully, its huge pectoral fin sweeping through
            the water like a wing. Sunbeams penetrate from above, illuminating
            the whale's textured skin. Small fish scatter. Awe-inspiring scale
            and grace.
          duration: 5
          image_url: https://example.com/sample-image.jpg
          resolution: 1080p
          negative_prompt: >-
            low resolution, errors, worst quality, low quality, incomplete,
            extra fingers, bad proportions, blurry, distorted
          enable_safety_checker: true
          enable_prompt_expansion: true
    Wan27RefRequest:
      type: object
      title: WAN 2.7 Reference-to-Video (Alibaba) [reference-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - wan-2.7-ref
          description: 'Fixed value: `"wan-2.7-ref"`'
        input:
          $ref: '#/components/schemas/Wan27RefInput'
      example:
        model: wan-2.7-ref
        input:
          prompt: A person walking through a beautiful garden, cinematic style.
          duration: 5
          resolution: 1080p
          aspect_ratio: '16:9'
          negative_prompt: low resolution, errors, worst quality, low quality
          enable_safety_checker: true
    Wan27EditRequest:
      type: object
      title: WAN 2.7 Edit Video (Alibaba) [video-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - wan-2.7-edit
          description: 'Fixed value: `"wan-2.7-edit"`'
        input:
          $ref: '#/components/schemas/Wan27EditInput'
      example:
        model: wan-2.7-edit
        input:
          prompt: >-
            Transform the entire scene into a beautiful watercolor painting
            style. Soft brushstrokes, flowing paint washes, visible paper
            texture. Colors should bleed and blend naturally like wet watercolor
            on paper.
          video_url: https://example.com/sample-video.mp4
          resolution: 1080p
          audio_setting: auto
          enable_safety_checker: true
    Seedance20Request:
      type: object
      title: Seedance 2.0 (ByteDance) [text-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - seedance-2.0
          description: 'Fixed value: `"seedance-2.0"`'
        input:
          $ref: '#/components/schemas/Seedance20Input'
      example:
        model: seedance-2.0
        input:
          prompt: >-
            An octopus finds a football in the ocean and excitedly calls its
            octopus friends to come and play. Cut scene to an octopus football
            game under the sea.
          duration: '4'
          resolution: 720p
          aspect_ratio: auto
          generate_audio: true
    Seedance20ImageRequest:
      type: object
      title: Seedance 2.0 Image-to-Video (ByteDance) [image-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - seedance-2.0-image
          description: 'Fixed value: `"seedance-2.0-image"`'
        input:
          $ref: '#/components/schemas/Seedance20ImageInput'
      example:
        model: seedance-2.0-image
        input:
          prompt: >-
            An octopus finds a football in the ocean and excitedly calls its
            octopus friends to come and play. Cut scene to an octopus football
            game under the sea.
          duration: '4'
          image_url: https://example.com/sample-image.jpg
          resolution: 720p
          aspect_ratio: auto
          generate_audio: true
    Seedance20RefRequest:
      type: object
      title: Seedance 2.0 Reference-to-Video (ByteDance) [reference-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - seedance-2.0-ref
          description: 'Fixed value: `"seedance-2.0-ref"`'
        input:
          $ref: '#/components/schemas/Seedance20RefInput'
      example:
        model: seedance-2.0-ref
        input:
          prompt: >-
            An octopus finds a football in the ocean and excitedly calls its
            octopus friends to come and play. Cut scene to an octopus football
            game under the sea.
          duration: '4'
          image_urls:
            - https://example.com/sample-image.jpg
          resolution: 720p
          aspect_ratio: auto
          generate_audio: true
    Seedance20FastRequest:
      type: object
      title: Seedance 2.0 Fast (ByteDance) [text-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - seedance-2.0-fast
          description: 'Fixed value: `"seedance-2.0-fast"`'
        input:
          $ref: '#/components/schemas/Seedance20FastInput'
      example:
        model: seedance-2.0-fast
        input:
          prompt: >-
            An octopus finds a football in the ocean and excitedly calls its
            octopus friends to come and play. Cut scene to an octopus football
            game under the sea.
          duration: '4'
          resolution: 720p
          aspect_ratio: auto
          generate_audio: true
    Seedance20FastImageRequest:
      type: object
      title: Seedance 2.0 Fast Image-to-Video (ByteDance) [image-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - seedance-2.0-fast-image
          description: 'Fixed value: `"seedance-2.0-fast-image"`'
        input:
          $ref: '#/components/schemas/Seedance20FastImageInput'
      example:
        model: seedance-2.0-fast-image
        input:
          prompt: >-
            An octopus finds a football in the ocean and excitedly calls its
            octopus friends to come and play. Cut scene to an octopus football
            game under the sea.
          duration: '4'
          image_url: https://example.com/sample-image.jpg
          resolution: 720p
          aspect_ratio: auto
          generate_audio: true
    Seedance20FastRefRequest:
      type: object
      title: Seedance 2.0 Fast Reference-to-Video (ByteDance) [reference-to-video]
      required:
        - model
        - input
      properties:
        model:
          type: string
          enum:
            - seedance-2.0-fast-ref
          description: 'Fixed value: `"seedance-2.0-fast-ref"`'
        input:
          $ref: '#/components/schemas/Seedance20FastRefInput'
      example:
        model: seedance-2.0-fast-ref
        input:
          prompt: >-
            An octopus finds a football in the ocean and excitedly calls its
            octopus friends to come and play. Cut scene to an octopus football
            game under the sea.
          duration: '4'
          image_urls:
            - https://example.com/sample-image.jpg
          resolution: 720p
          aspect_ratio: auto
          generate_audio: true
    SubmitResponseDto:
      type: object
      properties:
        task_id:
          type: string
          description: Unique task ID — use this to poll GET /hub/v1/tasks/:task_id
          example: hub-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
        status:
          type: string
          description: Task status at creation time (usually `pending`)
          example: pending
        capability:
          type: string
          description: 'Capability: `image` | `video` | `audio` | `transcribe`'
          example: video
        model:
          type: string
          description: Model ID
          example: veo-3.1-fast
        vendor:
          type: string
          description: Model vendor
          example: Google
        mode:
          type: string
          description: Generation mode (e.g. `text-to-video`, `image-to-image`)
          example: text-to-video
        created_at:
          type: string
          description: ISO 8601 creation timestamp
          example: '2026-05-18T09:00:00.000Z'
      required:
        - task_id
        - status
        - capability
        - model
        - vendor
        - mode
        - created_at
    Veo31Input:
      type: object
      title: Veo 3.1 — input
      description: Google Veo 3.1 text-to-video with optional native audio.
      properties:
        seed:
          type: integer
          description: The seed for the random number generator.
        prompt:
          type: string
          description: The text prompt describing the video you want to generate
        auto_fix:
          type: boolean
          description: >-
            Whether to automatically attempt to fix prompts that fail content
            policy or other validation checks by rewriting them.
          default: true
        duration:
          type: string
          description: The duration of the generated video.
          enum:
            - 4s
            - 6s
            - 8s
          default: 8s
        resolution:
          type: string
          description: The resolution of the generated video.
          enum:
            - 720p
            - 1080p
            - 4k
          default: 720p
        aspect_ratio:
          type: string
          description: Aspect ratio of the generated video
          enum:
            - '16:9'
            - '9:16'
          default: '16:9'
        generate_audio:
          type: boolean
          description: Whether to generate audio for the video.
          default: true
        negative_prompt:
          type: string
          description: A negative prompt to guide the video generation.
        safety_tolerance:
          type: string
          description: >-
            The safety tolerance level for content moderation. 1 is the most
            strict (blocks most content), 6 is the least strict. Note: API-only
            parameter.
          enum:
            - '1'
            - '2'
            - '3'
            - '4'
            - '5'
            - '6'
          default: '4'
      required:
        - prompt
    Veo31ImageInput:
      type: object
      title: Veo 3.1 Image-to-Video — input
      description: 'Veo 3.1: animate a single reference image.'
      properties:
        seed:
          type: integer
          description: The seed for the random number generator.
        prompt:
          type: string
          description: The text prompt describing the video you want to generate
        auto_fix:
          type: boolean
          description: >-
            Whether to automatically attempt to fix prompts that fail content
            policy or other validation checks by rewriting them.
        duration:
          type: string
          description: The duration of the generated video.
          enum:
            - 4s
            - 6s
            - 8s
          default: 8s
        image_url:
          type: string
          description: >-
            URL of the input image to animate. Should be 720p or higher
            resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9
            or 9:16 aspect ratio, it will be cropped to fit.
        resolution:
          type: string
          description: The resolution of the generated video.
          enum:
            - 720p
            - 1080p
            - 4k
          default: 720p
        aspect_ratio:
          type: string
          description: >-
            The aspect ratio of the generated video. Only 16:9 and 9:16 are
            supported.
          enum:
            - auto
            - '16:9'
            - '9:16'
          default: auto
        generate_audio:
          type: boolean
          description: Whether to generate audio for the video.
          default: true
        negative_prompt:
          type: string
          description: A negative prompt to guide the video generation.
        safety_tolerance:
          type: string
          description: >-
            The safety tolerance level for content moderation. 1 is the most
            strict (blocks most content), 6 is the least strict. Note: API-only
            parameter.
          enum:
            - '1'
            - '2'
            - '3'
            - '4'
            - '5'
            - '6'
          default: '4'
      required:
        - prompt
        - image_url
    Veo31RefInput:
      type: object
      title: Veo 3.1 Multi-Ref — input
      description: >-
        Veo 3.1: generate video from reference images for consistent subject
        appearance.
      properties:
        prompt:
          type: string
          description: The text prompt describing the video you want to generate
        auto_fix:
          type: boolean
          description: >-
            Whether to automatically attempt to fix prompts that fail content
            policy or other validation checks by rewriting them.
        duration:
          type: string
          description: The duration of the generated video.
          enum:
            - 4s
            - 6s
            - 8s
          default: 8s
        image_urls:
          type: array
          description: >-
            URLs of the reference images to use for consistent subject
            appearance
          items:
            type: string
        resolution:
          type: string
          description: The resolution of the generated video.
          enum:
            - 720p
            - 1080p
            - 4k
          default: 720p
        aspect_ratio:
          type: string
          description: The aspect ratio of the generated video.
          enum:
            - '16:9'
            - '9:16'
          default: '16:9'
        generate_audio:
          type: boolean
          description: Whether to generate audio for the video.
          default: true
        safety_tolerance:
          type: string
          description: >-
            The safety tolerance level for content moderation. 1 is the most
            strict (blocks most content), 6 is the least strict. Note: API-only
            parameter.
          enum:
            - '1'
            - '2'
            - '3'
            - '4'
            - '5'
            - '6'
          default: '4'
      required:
        - prompt
        - image_urls
    Veo31FirstLastInput:
      type: object
      title: Veo 3.1 First-Last Frame — input
      description: >-
        Veo 3.1: generate a transition video between a start frame and an end
        frame.
      properties:
        seed:
          type: integer
          description: The seed for the random number generator.
        prompt:
          type: string
          description: The text prompt describing the video you want to generate
        auto_fix:
          type: boolean
          description: >-
            Whether to automatically attempt to fix prompts that fail content
            policy or other validation checks by rewriting them.
        duration:
          type: string
          description: The duration of the generated video.
          enum:
            - 4s
            - 6s
            - 8s
          default: 8s
        resolution:
          type: string
          description: The resolution of the generated video.
          enum:
            - 720p
            - 1080p
            - 4k
          default: 720p
        aspect_ratio:
          type: string
          description: The aspect ratio of the generated video.
          enum:
            - auto
            - '16:9'
            - '9:16'
          default: auto
        generate_audio:
          type: boolean
          description: Whether to generate audio for the video.
          default: true
        last_frame_url:
          type: string
          description: URL of the last frame of the video
        first_frame_url:
          type: string
          description: URL of the first frame of the video
        negative_prompt:
          type: string
          description: A negative prompt to guide the video generation.
        safety_tolerance:
          type: string
          description: >-
            The safety tolerance level for content moderation. 1 is the most
            strict (blocks most content), 6 is the least strict. Note: API-only
            parameter.
          enum:
            - '1'
            - '2'
            - '3'
            - '4'
            - '5'
            - '6'
          default: '4'
      required:
        - prompt
        - last_frame_url
        - first_frame_url
    Veo31FastInput:
      type: object
      title: Veo 3.1 Fast — input
      description: 'Veo 3.1 Fast: lower-latency text-to-video at reduced cost.'
      properties:
        seed:
          type: integer
          description: The seed for the random number generator.
        prompt:
          type: string
          description: The text prompt describing the video you want to generate
        auto_fix:
          type: boolean
          description: >-
            Whether to automatically attempt to fix prompts that fail content
            policy or other validation checks by rewriting them.
          default: true
        duration:
          type: string
          description: The duration of the generated video.
          enum:
            - 4s
            - 6s
            - 8s
          default: 8s
        resolution:
          type: string
          description: The resolution of the generated video.
          enum:
            - 720p
            - 1080p
            - 4k
          default: 720p
        aspect_ratio:
          type: string
          description: Aspect ratio of the generated video
          enum:
            - '16:9'
            - '9:16'
          default: '16:9'
        generate_audio:
          type: boolean
          description: Whether to generate audio for the video.
          default: true
        negative_prompt:
          type: string
          description: A negative prompt to guide the video generation.
        safety_tolerance:
          type: string
          description: >-
            The safety tolerance level for content moderation. 1 is the most
            strict (blocks most content), 6 is the least strict. Note: API-only
            parameter.
          enum:
            - '1'
            - '2'
            - '3'
            - '4'
            - '5'
            - '6'
          default: '4'
      required:
        - prompt
    Veo31FastImageInput:
      type: object
      title: Veo 3.1 Fast Image-to-Video — input
      description: 'Veo 3.1 Fast: animate a reference image at lower cost.'
      properties:
        seed:
          type: integer
          description: The seed for the random number generator.
        prompt:
          type: string
          description: The text prompt describing the video you want to generate
        auto_fix:
          type: boolean
          description: >-
            Whether to automatically attempt to fix prompts that fail content
            policy or other validation checks by rewriting them.
        duration:
          type: string
          description: The duration of the generated video.
          enum:
            - 4s
            - 6s
            - 8s
          default: 8s
        image_url:
          type: string
          description: >-
            URL of the input image to animate. Should be 720p or higher
            resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9
            or 9:16 aspect ratio, it will be cropped to fit.
        resolution:
          type: string
          description: The resolution of the generated video.
          enum:
            - 720p
            - 1080p
            - 4k
          default: 720p
        aspect_ratio:
          type: string
          description: >-
            The aspect ratio of the generated video. Only 16:9 and 9:16 are
            supported.
          enum:
            - auto
            - '16:9'
            - '9:16'
          default: auto
        generate_audio:
          type: boolean
          description: Whether to generate audio for the video.
          default: true
        negative_prompt:
          type: string
          description: A negative prompt to guide the video generation.
        safety_tolerance:
          type: string
          description: >-
            The safety tolerance level for content moderation. 1 is the most
            strict (blocks most content), 6 is the least strict. Note: API-only
            parameter.
          enum:
            - '1'
            - '2'
            - '3'
            - '4'
            - '5'
            - '6'
          default: '4'
      required:
        - prompt
        - image_url
    Veo31FastRefInput:
      type: object
      title: Veo 3.1 Fast Multi-Ref — input
      description: 'Veo 3.1 Fast: multi-reference video at lower cost.'
      properties:
        prompt:
          type: string
          description: The text prompt describing the video you want to generate
        auto_fix:
          type: boolean
          description: >-
            Whether to automatically attempt to fix prompts that fail content
            policy or other validation checks by rewriting them.
        duration:
          type: string
          description: The duration of the generated video.
          enum:
            - 4s
            - 6s
            - 8s
          default: 8s
        image_urls:
          type: array
          description: >-
            URLs of the reference images to use for consistent subject
            appearance
          items:
            type: string
        resolution:
          type: string
          description: The resolution of the generated video.
          enum:
            - 720p
            - 1080p
            - 4k
          default: 720p
        aspect_ratio:
          type: string
          description: The aspect ratio of the generated video.
          enum:
            - '16:9'
            - '9:16'
          default: '16:9'
        generate_audio:
          type: boolean
          description: Whether to generate audio for the video.
          default: true
        safety_tolerance:
          type: string
          description: >-
            The safety tolerance level for content moderation. 1 is the most
            strict (blocks most content), 6 is the least strict. Note: API-only
            parameter.
          enum:
            - '1'
            - '2'
            - '3'
            - '4'
            - '5'
            - '6'
          default: '4'
      required:
        - prompt
        - image_urls
    Veo31LiteInput:
      type: object
      title: Veo 3.1 Lite — input
      description: 'Veo 3.1 Lite: lowest cost text-to-video (720p/1080p only).'
      properties:
        seed:
          type: integer
          description: The seed for the random number generator.
        prompt:
          type: string
          description: The text prompt describing the video you want to generate
        auto_fix:
          type: boolean
          description: >-
            Whether to automatically attempt to fix prompts that fail content
            policy or other validation checks by rewriting them.
          default: true
        duration:
          type: string
          description: The duration of the generated video.
          enum:
            - 4s
            - 6s
            - 8s
          default: 8s
        resolution:
          type: string
          description: The resolution of the generated video.
          enum:
            - 720p
            - 1080p
          default: 720p
        aspect_ratio:
          type: string
          description: Aspect ratio of the generated video
          enum:
            - '16:9'
            - '9:16'
          default: '16:9'
        generate_audio:
          type: boolean
          description: Whether to generate audio for the video.
          default: true
        negative_prompt:
          type: string
          description: A negative prompt to guide the video generation.
        safety_tolerance:
          type: string
          description: >-
            The safety tolerance level for content moderation. 1 is the most
            strict (blocks most content), 6 is the least strict. Note: API-only
            parameter.
          enum:
            - '1'
            - '2'
            - '3'
            - '4'
            - '5'
            - '6'
          default: '4'
      required:
        - prompt
    Veo31LiteImageInput:
      type: object
      title: Veo 3.1 Lite Image-to-Video — input
      description: 'Veo 3.1 Lite: animate a reference image at lowest cost.'
      properties:
        seed:
          type: integer
          description: The seed for the random number generator.
        prompt:
          type: string
          description: The text prompt describing the video you want to generate
        auto_fix:
          type: boolean
          description: >-
            Whether to automatically attempt to fix prompts that fail content
            policy or other validation checks by rewriting them.
        duration:
          type: string
          description: The duration of the generated video.
          enum:
            - 4s
            - 6s
            - 8s
          default: 8s
        image_url:
          type: string
          description: >-
            URL of the input image to animate. Should be 720p or higher
            resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9
            or 9:16 aspect ratio, it will be cropped to fit.
        resolution:
          type: string
          description: The resolution of the generated video.
          enum:
            - 720p
            - 1080p
          default: 720p
        aspect_ratio:
          type: string
          description: >-
            The aspect ratio of the generated video. Only 16:9 and 9:16 are
            supported.
          enum:
            - auto
            - '16:9'
            - '9:16'
          default: auto
        generate_audio:
          type: boolean
          description: Whether to generate audio for the video.
          default: true
        negative_prompt:
          type: string
          description: A negative prompt to guide the video generation.
        safety_tolerance:
          type: string
          description: >-
            The safety tolerance level for content moderation. 1 is the most
            strict (blocks most content), 6 is the least strict. Note: API-only
            parameter.
          enum:
            - '1'
            - '2'
            - '3'
            - '4'
            - '5'
            - '6'
          default: '4'
      required:
        - prompt
        - image_url
    Veo3Input:
      type: object
      title: Veo 3 — input
      description: Google Veo 3 text-to-video with native audio.
      properties:
        seed:
          type: integer
          description: The seed for the random number generator.
        prompt:
          type: string
          description: The text prompt describing the video you want to generate
        auto_fix:
          type: boolean
          description: >-
            Whether to automatically attempt to fix prompts that fail content
            policy or other validation checks by rewriting them.
          default: true
        duration:
          type: string
          description: The duration of the generated video.
          enum:
            - 4s
            - 6s
            - 8s
          default: 8s
        resolution:
          type: string
          description: The resolution of the generated video.
          enum:
            - 720p
            - 1080p
          default: 720p
        aspect_ratio:
          type: string
          description: The aspect ratio of the generated video.
          enum:
            - '16:9'
            - '9:16'
          default: '16:9'
        generate_audio:
          type: boolean
          description: Whether to generate audio for the video.
          default: true
        negative_prompt:
          type: string
          description: A negative prompt to guide the video generation.
        safety_tolerance:
          type: string
          description: >-
            The safety tolerance level for content moderation. 1 is the most
            strict (blocks most content), 6 is the least strict. Note: API-only
            parameter.
          enum:
            - '1'
            - '2'
            - '3'
            - '4'
            - '5'
            - '6'
          default: '4'
      required:
        - prompt
    Veo3ImageInput:
      type: object
      title: Veo 3 Image-to-Video — input
      description: 'Google Veo 3: animate a single reference image.'
      properties:
        seed:
          type: integer
          description: The seed for the random number generator.
        prompt:
          type: string
          description: The text prompt describing how the image should be animated
        auto_fix:
          type: boolean
          description: >-
            Whether to automatically attempt to fix prompts that fail content
            policy or other validation checks by rewriting them.
        duration:
          type: string
          description: The duration of the generated video.
          enum:
            - 4s
            - 6s
            - 8s
          default: 8s
        image_url:
          type: string
          description: >-
            URL of the input image to animate. Should be 720p or higher
            resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9
            or 9:16 aspect ratio, it will be cropped to fit.
        resolution:
          type: string
          description: The resolution of the generated video.
          enum:
            - 720p
            - 1080p
          default: 720p
        aspect_ratio:
          type: string
          description: The aspect ratio of the generated video.
          enum:
            - auto
            - '16:9'
            - '9:16'
          default: auto
        generate_audio:
          type: boolean
          description: Whether to generate audio for the video.
          default: true
        negative_prompt:
          type: string
          description: A negative prompt to guide the video generation.
        safety_tolerance:
          type: string
          description: >-
            The safety tolerance level for content moderation. 1 is the most
            strict (blocks most content), 6 is the least strict. Note: API-only
            parameter.
          enum:
            - '1'
            - '2'
            - '3'
            - '4'
            - '5'
            - '6'
          default: '4'
      required:
        - prompt
        - image_url
    KlingV3StandardInput:
      type: object
      title: Kling v3 Standard — input
      description: Kling v3 Standard text-to-video with optional native audio.
      properties:
        prompt:
          type: string
          description: >-
            Text prompt for video generation. Either prompt or multi_prompt must
            be provided, but not both.
        duration:
          type: string
          description: The duration of the generated video in seconds
          enum:
            - '3'
            - '4'
            - '5'
            - '6'
            - '7'
            - '8'
            - '9'
            - '10'
            - '11'
            - '12'
            - '13'
            - '14'
            - '15'
          default: '5'
        cfg_scale:
          type: number
          description: >-
            The CFG (Classifier Free Guidance) scale is a measure of how close
            you want the model to stick to your prompt.
          default: 0.5
        shot_type:
          type: string
          description: >-
            The type of multi-shot video generation. 'intelligent' lets the
            model automatically determine shot structure.
          enum:
            - customize
            - intelligent
          default: customize
        aspect_ratio:
          type: string
          description: The aspect ratio of the generated video frame
          enum:
            - '16:9'
            - '9:16'
            - '1:1'
          default: '16:9'
        multi_prompt:
          type: array
          description: >-
            List of prompts for multi-shot video generation. If provided,
            overrides the single prompt and divides the video into multiple
            shots with specified prompts and durations.
          items:
            type: object
        generate_audio:
          type: boolean
          description: >-
            Whether to generate native audio for the video. Supports Chinese and
            English voice output. Other languages are automatically translated
            to English. For English speech, use lowercase letters; for acronyms
            or proper nouns, use uppercase.
          default: true
        negative_prompt:
          type: string
          default: blur, distort, and low quality
    KlingV3StandardImageInput:
      type: object
      title: Kling v3 Standard Image-to-Video — input
      description: Kling v3 Standard image-to-video (3-15 seconds).
      properties:
        prompt:
          type: string
          description: >-
            Text prompt for video generation. Either prompt or multi_prompt must
            be provided, but not both.
        duration:
          type: string
          description: The duration of the generated video in seconds
          enum:
            - '3'
            - '4'
            - '5'
            - '6'
            - '7'
            - '8'
            - '9'
            - '10'
            - '11'
            - '12'
            - '13'
            - '14'
            - '15'
          default: '5'
        elements:
          type: array
          description: >-
            Elements (characters/objects) to include in the video. Each element
            can either be an image set (frontal + reference images) or a video.
            Reference in prompt as @Element1, @Element2, etc.
          items:
            type: object
        cfg_scale:
          type: number
          description: >-
            The CFG (Classifier Free Guidance) scale is a measure of how close
            you want the model to stick to your prompt.
          default: 0.5
        shot_type:
          type: string
          description: >-
            The type of multi-shot video generation. 'intelligent' lets the
            model automatically determine shot structure.
          enum:
            - customize
            - intelligent
          default: customize
        multi_prompt:
          type: array
          description: >-
            List of prompts for multi-shot video generation. If provided,
            divides the video into multiple shots.
          items:
            type: object
        end_image_url:
          type: string
          description: URL of the image to be used for the end of the video
        generate_audio:
          type: boolean
          description: >-
            Whether to generate native audio for the video. Supports Chinese and
            English voice output. Other languages are automatically translated
            to English. For English speech, use lowercase letters; for acronyms
            or proper nouns, use uppercase.
          default: true
        negative_prompt:
          type: string
          default: blur, distort, and low quality
        start_image_url:
          type: string
          description: URL of the image to be used for the video
      required:
        - start_image_url
    KlingV3ProInput:
      type: object
      title: Kling v3 Pro — input
      description: Kling v3 Pro text-to-video with optional native audio.
      properties:
        prompt:
          type: string
          description: >-
            Text prompt for video generation. Either prompt or multi_prompt must
            be provided, but not both.
        duration:
          type: string
          description: The duration of the generated video in seconds
          enum:
            - '3'
            - '4'
            - '5'
            - '6'
            - '7'
            - '8'
            - '9'
            - '10'
            - '11'
            - '12'
            - '13'
            - '14'
            - '15'
          default: '5'
        cfg_scale:
          type: number
          description: >-
            The CFG (Classifier Free Guidance) scale is a measure of how close
            you want the model to stick to your prompt.
          default: 0.5
        shot_type:
          type: string
          description: >-
            The type of multi-shot video generation. 'intelligent' lets the
            model automatically determine shot structure.
          enum:
            - customize
            - intelligent
          default: customize
        aspect_ratio:
          type: string
          description: The aspect ratio of the generated video frame
          enum:
            - '16:9'
            - '9:16'
            - '1:1'
          default: '16:9'
        multi_prompt:
          type: array
          description: >-
            List of prompts for multi-shot video generation. If provided,
            overrides the single prompt and divides the video into multiple
            shots with specified prompts and durations.
          items:
            type: object
        generate_audio:
          type: boolean
          description: >-
            Whether to generate native audio for the video. Supports Chinese and
            English voice output. Other languages are automatically translated
            to English. For English speech, use lowercase letters; for acronyms
            or proper nouns, use uppercase.
          default: true
        negative_prompt:
          type: string
          default: blur, distort, and low quality
    KlingV3ProImageInput:
      type: object
      title: Kling v3 Pro Image-to-Video — input
      description: Kling v3 Pro image-to-video (3-15 seconds).
      properties:
        prompt:
          type: string
          description: >-
            Text prompt for video generation. Either prompt or multi_prompt must
            be provided, but not both.
        duration:
          type: string
          description: The duration of the generated video in seconds
          enum:
            - '3'
            - '4'
            - '5'
            - '6'
            - '7'
            - '8'
            - '9'
            - '10'
            - '11'
            - '12'
            - '13'
            - '14'
            - '15'
          default: '5'
        elements:
          type: array
          description: >-
            Elements (characters/objects) to include in the video. Each element
            can either be an image set (frontal + reference images) or a video.
            Reference in prompt as @Element1, @Element2, etc.
          items:
            type: object
        cfg_scale:
          type: number
          description: >-
            The CFG (Classifier Free Guidance) scale is a measure of how close
            you want the model to stick to your prompt.
          default: 0.5
        shot_type:
          type: string
          description: >-
            The type of multi-shot video generation. 'intelligent' lets the
            model automatically determine shot structure.
          enum:
            - customize
            - intelligent
          default: customize
        multi_prompt:
          type: array
          description: >-
            List of prompts for multi-shot video generation. If provided,
            divides the video into multiple shots.
          items:
            type: object
        end_image_url:
          type: string
          description: URL of the image to be used for the end of the video
        generate_audio:
          type: boolean
          description: >-
            Whether to generate native audio for the video. Supports Chinese and
            English voice output. Other languages are automatically translated
            to English. For English speech, use lowercase letters; for acronyms
            or proper nouns, use uppercase.
          default: true
        negative_prompt:
          type: string
          default: blur, distort, and low quality
        start_image_url:
          type: string
          description: URL of the image to be used for the video
      required:
        - start_image_url
    Wan27Input:
      type: object
      title: WAN 2.7 — input
      description: >-
        WAN 2.7 text-to-video - high quality generation. Default resolution is
        1080p.
      properties:
        seed:
          type: integer
          description: Random seed for reproducibility (0-2147483647).
        prompt:
          type: string
          description: Text prompt describing the desired video. Max 5000 characters.
        duration:
          type: integer
          description: Output video duration in seconds (2-15).
          enum:
            - 2
            - 3
            - 4
            - 5
            - 6
            - 7
            - 8
            - 9
            - 10
            - 11
            - 12
            - 13
            - 14
            - 15
          default: 5
        audio_url:
          type: string
          description: >-
            URL of driving audio. Supports WAV and MP3. Duration: 3-30s. Max 15
            MB. If not provided, the model auto-generates matching background
            music.
        resolution:
          type: string
          description: Output video resolution tier.
          enum:
            - 720p
            - 1080p
          default: 1080p
        aspect_ratio:
          type: string
          description: Aspect ratio of the generated video.
          enum:
            - '16:9'
            - '9:16'
            - '1:1'
            - '4:3'
            - '3:4'
          default: '16:9'
        negative_prompt:
          type: string
          description: Content to avoid in the video. Max 500 characters.
        enable_safety_checker:
          type: boolean
          description: Enable content moderation for input and output.
          default: true
        enable_prompt_expansion:
          type: boolean
          description: Enable intelligent prompt rewriting.
          default: true
      required:
        - prompt
    Wan27ImageInput:
      type: object
      title: WAN 2.7 Image-to-Video — input
      description: WAN 2.7 image-to-video (720p/1080p, duration 2-15s).
      properties:
        seed:
          type: integer
          description: Random seed for reproducibility (0-2147483647).
        prompt:
          type: string
          description: Text prompt describing the desired video. Max 5000 characters.
        duration:
          type: integer
          description: Output video duration in seconds (2-15).
          enum:
            - 2
            - 3
            - 4
            - 5
            - 6
            - 7
            - 8
            - 9
            - 10
            - 11
            - 12
            - 13
            - 14
            - 15
          default: 5
        audio_url:
          type: string
          description: >-
            URL of driving audio. Supports WAV and MP3. Duration: 2-30s. Max 15
            MB.
        image_url:
          type: string
          description: >-
            URL of the first frame image. Formats: JPEG, JPG, PNG, BMP, WEBP.
            Max 20 MB.
        video_url:
          type: string
          description: >-
            URL of a video clip to continue from. Format: MP4, MOV. Duration:
            2-10s. Max 100 MB. Cannot be combined with image_url.
        resolution:
          type: string
          description: Output video resolution tier.
          enum:
            - 720p
            - 1080p
          default: 1080p
        end_image_url:
          type: string
          description: >-
            URL of the last frame image for first-and-last-frame-to-video. Same
            constraints as image_url.
        negative_prompt:
          type: string
          description: Content to avoid in the video. Max 500 characters.
        enable_safety_checker:
          type: boolean
          description: Enable content moderation for input and output.
          default: true
        enable_prompt_expansion:
          type: boolean
          description: Enable intelligent prompt rewriting.
          default: true
    Wan27RefInput:
      type: object
      title: WAN 2.7 Reference-to-Video — input
      description: >-
        WAN 2.7 reference-to-video using character/object reference images and
        videos (duration 2-10s).
      properties:
        seed:
          type: integer
          description: Random seed for reproducibility (0-2147483647).
        prompt:
          type: string
          description: Text prompt describing the desired video. Max 5000 characters.
        duration:
          type: integer
          description: Output video duration in seconds (2-10).
          enum:
            - 2
            - 3
            - 4
            - 5
            - 6
            - 7
            - 8
            - 9
            - 10
          default: 5
        resolution:
          type: string
          description: Output video resolution tier.
          enum:
            - 720p
            - 1080p
          default: 1080p
        multi_shots:
          type: boolean
          description: >-
            When true, enables intelligent multi-shot segmentation. When false
            (default), generates a single continuous shot.
          default: false
        aspect_ratio:
          type: string
          description: Aspect ratio of the generated video.
          enum:
            - '16:9'
            - '9:16'
            - '1:1'
            - '4:3'
            - '3:4'
          default: '16:9'
        negative_prompt:
          type: string
          description: Content to avoid in the video. Max 500 characters.
        reference_image_urls:
          type: array
          description: >-
            Reference image URLs for character/object appearance. Pass multiple
            images for multi-subject generation. Max 20 MB each.
          items:
            type: string
        reference_video_urls:
          type: array
          description: >-
            Reference video URLs for character/object appearance and motion.
            Pass multiple videos for multi-subject generation. Max 100 MB each.
            Note: when video inputs are provided, billing includes the total
            input video duration plus the output duration. Your charged credits
            will be higher than the output duration alone.
          items:
            type: string
        enable_safety_checker:
          type: boolean
          description: Enable content moderation for input and output.
          default: true
      required:
        - prompt
    Wan27EditInput:
      type: object
      title: WAN 2.7 Edit Video — input
      description: >-
        WAN 2.7 video editing: instruction-based editing, reference-image-based
        editing and style transfer (input video 2-10s).
      properties:
        seed:
          type: integer
          description: Random seed for reproducibility (0-2147483647).
        prompt:
          type: string
          description: >-
            Editing instruction or style transfer description. Describe what
            changes you want applied to the video.
        duration:
          type: string
          description: >-
            Output duration in seconds. '0' means match the input video's
            duration. When set to 2-10, the output is truncated to that length
            from the start.
          enum:
            - '0'
            - '2'
            - '3'
            - '4'
            - '5'
            - '6'
            - '7'
            - '8'
            - '9'
            - '10'
          default: '0'
        video_url:
          type: string
          description: >-
            URL of the input video to edit. Format: MP4, MOV. Duration: 2-10s.
            Max 100 MB.
        resolution:
          type: string
          description: Output video resolution tier.
          enum:
            - 720p
            - 1080p
          default: 1080p
        aspect_ratio:
          type: string
          description: >-
            Aspect ratio of the output video. If not provided, uses the input
            video's aspect ratio.
          enum:
            - '16:9'
            - '9:16'
            - '1:1'
            - '4:3'
            - '3:4'
        audio_setting:
          type: string
          description: >-
            Audio handling: 'auto' lets the model decide whether to regenerate
            audio; 'origin' preserves the original audio from the input video.
          enum:
            - auto
            - origin
          default: auto
        reference_image_url:
          type: string
          description: >-
            Optional reference image URL for reference-based editing. When
            provided, the edit is guided by the visual style or content of this
            image.
        enable_safety_checker:
          type: boolean
          description: Enable content moderation for input and output.
          default: true
      required:
        - prompt
        - video_url
    Seedance20Input:
      type: object
      title: Seedance 2.0 — input
      description: >-
        ByteDance Seedance 2.0: cinematic text-to-video with native audio,
        physics, and camera control.
      properties:
        seed:
          type: integer
          description: >-
            Random seed for reproducibility. Note that results may still vary
            slightly even with the same seed.
        prompt:
          type: string
          description: The text prompt used to generate the video
        duration:
          type: string
          description: Duration of the video in seconds (4-15).
          enum:
            - '4'
            - '5'
            - '6'
            - '7'
            - '8'
            - '9'
            - '10'
            - '11'
            - '12'
            - '13'
            - '14'
            - '15'
          default: '4'
        resolution:
          type: string
          description: >-
            Video resolution - 480p for faster generation, 720p for balance,
            1080p for highest quality.
          enum:
            - 480p
            - 720p
            - 1080p
          default: 720p
        end_user_id:
          type: string
          description: The unique user ID of the end user.
        aspect_ratio:
          type: string
          description: >-
            The aspect ratio of the generated video. Use 16:9 for landscape,
            9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide
            cinematic, or auto to let the model decide.
          enum:
            - auto
            - '21:9'
            - '16:9'
            - '4:3'
            - '1:1'
            - '3:4'
            - '9:16'
          default: auto
        generate_audio:
          type: boolean
          description: >-
            Whether to generate synchronized audio for the video, including
            sound effects, ambient sounds, and lip-synced speech. The cost of
            video generation is the same regardless of whether audio is
            generated or not.
          default: true
      required:
        - prompt
    Seedance20ImageInput:
      type: object
      title: Seedance 2.0 Image-to-Video — input
      description: >-
        ByteDance Seedance 2.0: animate images with cinematic quality and
        synchronized audio.
      properties:
        seed:
          type: integer
          description: >-
            Random seed for reproducibility. Note that results may still vary
            slightly even with the same seed.
        prompt:
          type: string
          description: >-
            The text prompt describing the desired motion and action for the
            video.
        duration:
          type: string
          description: Duration of the video in seconds (4-15).
          enum:
            - '4'
            - '5'
            - '6'
            - '7'
            - '8'
            - '9'
            - '10'
            - '11'
            - '12'
            - '13'
            - '14'
            - '15'
          default: '4'
        image_url:
          type: string
          description: >-
            The URL of the starting frame image to animate. Supported formats:
            JPEG, PNG, WebP. Max 30 MB.
        resolution:
          type: string
          description: >-
            Video resolution - 480p for faster generation, 720p for balance,
            1080p for highest quality.
          enum:
            - 480p
            - 720p
            - 1080p
          default: 720p
        end_user_id:
          type: string
          description: The unique user ID of the end user.
        aspect_ratio:
          type: string
          description: >-
            The aspect ratio of the generated video. Use 16:9 for landscape,
            9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide
            cinematic, or auto to infer from the input image.
          enum:
            - auto
            - '21:9'
            - '16:9'
            - '4:3'
            - '1:1'
            - '3:4'
            - '9:16'
          default: auto
        end_image_url:
          type: string
          description: >-
            The URL of the image to use as the last frame of the video. When
            provided, the generated video will transition from the starting
            image to this ending image. Supported formats: JPEG, PNG, WebP. Max
            30 MB.
        generate_audio:
          type: boolean
          description: >-
            Whether to generate synchronized audio for the video, including
            sound effects, ambient sounds, and lip-synced speech. The cost of
            video generation is the same regardless of whether audio is
            generated or not.
          default: true
      required:
        - prompt
        - image_url
    Seedance20RefInput:
      type: object
      title: Seedance 2.0 Reference-to-Video — input
      description: >-
        ByteDance Seedance 2.0: generate video from reference images, videos,
        and audio clips.
      properties:
        seed:
          type: integer
          description: >-
            Random seed for reproducibility. Note that results may still vary
            slightly even with the same seed.
        prompt:
          type: string
          description: The text prompt used to generate the video.
        duration:
          type: string
          description: Duration of the video in seconds (4-15).
          enum:
            - '4'
            - '5'
            - '6'
            - '7'
            - '8'
            - '9'
            - '10'
            - '11'
            - '12'
            - '13'
            - '14'
            - '15'
          default: '4'
        audio_urls:
          type: array
          description: >-
            Reference audio to guide video generation. Refer to them in the
            prompt as @Audio1, @Audio2, etc. Supported formats: MP3, WAV. Up to
            3 files, combined duration must not exceed 15 seconds. Max 15 MB per
            file.If audio is provided, at least one reference image or video is
            required.
          items:
            type: string
        image_urls:
          type: array
          description: >-
            Reference images to guide video generation. Refer to them in the
            prompt as @Image1, @Image2, etc. Supported formats: JPEG, PNG, WebP.
            Max 30 MB per image. Up to 9 images. Total files across all
            modalities must not exceed 12.
          items:
            type: string
        resolution:
          type: string
          description: >-
            Video resolution - 480p for faster generation, 720p for balance,
            1080p for highest quality.
          enum:
            - 480p
            - 720p
            - 1080p
          default: 720p
        video_urls:
          type: array
          description: >-
            Reference videos to guide video generation. Refer to them in the
            prompt as @Video1, @Video2, etc. Supported formats: MP4, MOV. Up to
            3 videos, combined duration must be between 2 and 15 seconds, total
            size under 50 MB. Each video must be between ~480p (640x640) and
            ~720p (834x1112) in resolution.
          items:
            type: string
        end_user_id:
          type: string
          description: The unique user ID of the end user.
        aspect_ratio:
          type: string
          description: >-
            The aspect ratio of the generated video. Use 16:9 for landscape,
            9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide
            cinematic, or auto to let the model decide.
          enum:
            - auto
            - '21:9'
            - '16:9'
            - '4:3'
            - '1:1'
            - '3:4'
            - '9:16'
          default: auto
        generate_audio:
          type: boolean
          description: >-
            Whether to generate synchronized audio for the video, including
            sound effects, ambient sounds, and lip-synced speech. The cost of
            video generation is the same regardless of whether audio is
            generated or not.
          default: true
      required:
        - prompt
    Seedance20FastInput:
      type: object
      title: Seedance 2.0 Fast — input
      description: >-
        ByteDance Seedance 2.0 fast tier: lower-latency text-to-video with
        native audio.
      properties:
        seed:
          type: integer
          description: >-
            Random seed for reproducibility. Note that results may still vary
            slightly even with the same seed.
        prompt:
          type: string
          description: The text prompt used to generate the video
        duration:
          type: string
          description: Duration of the video in seconds (4-15).
          enum:
            - '4'
            - '5'
            - '6'
            - '7'
            - '8'
            - '9'
            - '10'
            - '11'
            - '12'
            - '13'
            - '14'
            - '15'
          default: '4'
        resolution:
          type: string
          description: Video resolution - 480p for faster generation, 720p for balance.
          enum:
            - 480p
            - 720p
          default: 720p
        end_user_id:
          type: string
          description: The unique user ID of the end user.
        aspect_ratio:
          type: string
          description: >-
            The aspect ratio of the generated video. Use 16:9 for landscape,
            9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide
            cinematic, or auto to let the model decide.
          enum:
            - auto
            - '21:9'
            - '16:9'
            - '4:3'
            - '1:1'
            - '3:4'
            - '9:16'
          default: auto
        generate_audio:
          type: boolean
          description: >-
            Whether to generate synchronized audio for the video, including
            sound effects, ambient sounds, and lip-synced speech. The cost of
            video generation is the same regardless of whether audio is
            generated or not.
          default: true
      required:
        - prompt
    Seedance20FastImageInput:
      type: object
      title: Seedance 2.0 Fast Image-to-Video — input
      description: >-
        ByteDance Seedance 2.0 fast tier: lower-latency image-to-video with
        synchronized audio.
      properties:
        seed:
          type: integer
          description: >-
            Random seed for reproducibility. Note that results may still vary
            slightly even with the same seed.
        prompt:
          type: string
          description: >-
            The text prompt describing the desired motion and action for the
            video.
        duration:
          type: string
          description: Duration of the video in seconds (4-15).
          enum:
            - '4'
            - '5'
            - '6'
            - '7'
            - '8'
            - '9'
            - '10'
            - '11'
            - '12'
            - '13'
            - '14'
            - '15'
          default: '4'
        image_url:
          type: string
          description: >-
            The URL of the starting frame image to animate. Supported formats:
            JPEG, PNG, WebP. Max 30 MB.
        resolution:
          type: string
          description: Video resolution - 480p for faster generation, 720p for balance.
          enum:
            - 480p
            - 720p
          default: 720p
        end_user_id:
          type: string
          description: The unique user ID of the end user.
        aspect_ratio:
          type: string
          description: >-
            The aspect ratio of the generated video. Use 16:9 for landscape,
            9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide
            cinematic, or auto to infer from the input image.
          enum:
            - auto
            - '21:9'
            - '16:9'
            - '4:3'
            - '1:1'
            - '3:4'
            - '9:16'
          default: auto
        end_image_url:
          type: string
          description: >-
            The URL of the image to use as the last frame of the video. When
            provided, the generated video will transition from the starting
            image to this ending image. Supported formats: JPEG, PNG, WebP. Max
            30 MB.
        generate_audio:
          type: boolean
          description: >-
            Whether to generate synchronized audio for the video, including
            sound effects, ambient sounds, and lip-synced speech. The cost of
            video generation is the same regardless of whether audio is
            generated or not.
          default: true
      required:
        - prompt
        - image_url
    Seedance20FastRefInput:
      type: object
      title: Seedance 2.0 Fast Reference-to-Video — input
      description: >-
        ByteDance Seedance 2.0 fast tier: reference-to-video with lower latency
        and cost.
      properties:
        seed:
          type: integer
          description: >-
            Random seed for reproducibility. Note that results may still vary
            slightly even with the same seed.
        prompt:
          type: string
          description: The text prompt used to generate the video.
        duration:
          type: string
          description: Duration of the video in seconds (4-15).
          enum:
            - '4'
            - '5'
            - '6'
            - '7'
            - '8'
            - '9'
            - '10'
            - '11'
            - '12'
            - '13'
            - '14'
            - '15'
          default: '4'
        audio_urls:
          type: array
          description: >-
            Reference audio to guide video generation. Refer to them in the
            prompt as @Audio1, @Audio2, etc. Supported formats: MP3, WAV. Up to
            3 files, combined duration must not exceed 15 seconds. Max 15 MB per
            file.If audio is provided, at least one reference image or video is
            required.
          items:
            type: string
        image_urls:
          type: array
          description: >-
            Reference images to guide video generation. Refer to them in the
            prompt as @Image1, @Image2, etc. Supported formats: JPEG, PNG, WebP.
            Max 30 MB per image. Up to 9 images. Total files across all
            modalities must not exceed 12.
          items:
            type: string
        resolution:
          type: string
          description: Video resolution - 480p for faster generation, 720p for balance.
          enum:
            - 480p
            - 720p
          default: 720p
        video_urls:
          type: array
          description: >-
            Reference videos to guide video generation. Refer to them in the
            prompt as @Video1, @Video2, etc. Supported formats: MP4, MOV. Up to
            3 videos, combined duration must be between 2 and 15 seconds, total
            size under 50 MB. Each video must be between ~480p (640x640) and
            ~720p (834x1112) in resolution.
          items:
            type: string
        end_user_id:
          type: string
          description: The unique user ID of the end user.
        aspect_ratio:
          type: string
          description: >-
            The aspect ratio of the generated video. Use 16:9 for landscape,
            9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide
            cinematic, or auto to let the model decide.
          enum:
            - auto
            - '21:9'
            - '16:9'
            - '4:3'
            - '1:1'
            - '3:4'
            - '9:16'
          default: auto
        generate_audio:
          type: boolean
          description: >-
            Whether to generate synchronized audio for the video, including
            sound effects, ambient sounds, and lip-synced speech. The cost of
            video generation is the same regardless of whether audio is
            generated or not.
          default: true
      required:
        - prompt
  securitySchemes:
    bearerAuth:
      scheme: bearer
      bearerFormat: JWT
      type: http

````