Documentation Index
Fetch the complete documentation index at: https://docs.mountsea.ai/llms.txt
Use this file to discover all available pages before exploring further.
Introduction to Hub
Hub is Mountsea’s premium AI Gateway — a single unified API for the world’s best image, video, audio and transcription models.Why Hub?
Flagship Quality
Official flagship models only. No knock-offs, no lossy distillations. Same model weights, same outputs — Veo 3.1, Nano Banana Pro, GPT Image 2, Kling v3 Pro, WAN 2.7, Seedance 2.0, ElevenLabs Music.
Production Stable
Built for 24/7 production traffic. Multi-region routing, automatic failover, transparent retries, and queue-aware load balancing. No surprise rate-limit walls, no flaky upstreams.
Cheaper Than Official
Pay only for successful generations, billed in unified credits at a meaningful discount versus going direct to the model provider. No per-provider minimums, no monthly subscriptions.
A Closer Look
| Hub | Going direct to each provider | |
|---|---|---|
| Models | Official flagship models across 4 capabilities — image, video, audio, transcribe | Sign up & maintain 6+ separate accounts (Google, OpenAI, Kuaishou, Alibaba, ByteDance, ElevenLabs, …) |
| Pricing | Lower than the official list price, paid only on status=completed — failed tasks are free | Pay full retail; failures still consume quota on most providers |
| Stability | Smart routing across redundant upstream channels, automatic retry on transient errors | Single point of failure; manual retry logic & rate-limit handling on your side |
| Onboarding | One Bearer token, one Base URL, one credit balance | Per-provider keys, per-provider billing, per-provider SDKs |
| Maintenance | New models added & old ones routed for you | Track every provider’s deprecations & migration notices yourself |
Capabilities
Image
Nano Banana (Fast / 2 / Pro) + GPT Image 2 — text-to-image and image editing
Video
Veo 3.1 · Kling v3 · WAN 2.7 · Seedance 2.0 — t2v, i2v, multi-ref, first-last, video edit
Audio (Music)
ElevenLabs Music — text-to-music with length & instrumental control
Transcribe
Whisper / Wizper — audio & video speech-to-text and translation
The Hub Pattern
Every Hub task follows the same simple pattern. Submit a task → get atask_id → poll until ready=true.
Discover available models (optional)
Call
GET /hub/v1/models?capability=image|video|audio|transcribe to see all models for a capability.
For a specific model, GET /hub/v1/models/:model returns its full input_schema plus a ready-to-copy example payload.Submit a task
Send
POST /hub/v1/{image|video|audio|transcribe} with { model, input }. You get back { task_id }.Featured Models
🖼️ Image
| Model | Provider | Capability | Highlights |
|---|---|---|---|
nano-banana | text-to-image | Gemini 2.5 Flash Image — fast & cheap | |
nano-banana-2 | text-to-image | Gemini 3.1 — extreme aspect ratios (1:8 / 8:1) | |
nano-banana-pro | text-to-image | Gemini 3 Pro — studio quality, up to 4K | |
gpt-image-2 | OpenAI | text-to-image | Detailed images with fine typography |
*-edit variants | — | image-to-image | Edit existing images with reference URLs |
🎬 Video
| Model | Provider | Capability | Highlights |
|---|---|---|---|
veo-3.1 / -fast / -lite | text-to-video | Native audio, 4s / 6s / 8s, up to 4K | |
veo-3.1-image | image-to-video | Animate a single reference image | |
veo-3.1-ref | reference-to-video | Multi-image consistent character | |
veo-3.1-first-last | first-last frame | Transition between two frames | |
kling-v3-pro / -standard | Kuaishou | text/image-to-video | 3–15s, native audio, multi-shot |
wan-2.7 | Alibaba | text-to-video | High quality, default 1080p |
wan-2.7-image | Alibaba | image-to-video | First-and-last frame, audio driving |
wan-2.7-ref | Alibaba | reference-to-video | Multi-subject reference |
wan-2.7-edit | Alibaba | video-to-video | Instruction-based video editing |
seedance-2.0 / -fast | ByteDance | text/image-to-video | Cinematic, native audio, physics |
🎵 Audio (Music)
| Model | Provider | Capability | Highlights |
|---|---|---|---|
elevenlabs-music | ElevenLabs | music-generate | Text-to-music, 3s–10min, instrumental switch |
🎙️ Transcribe
| Model | Provider | Capability | Highlights |
|---|---|---|---|
| Whisper / Wizper | — | transcribe / translate | BCP-47 language codes, word / segment timestamps |
Use GET /hub/v1/models?capability=transcribe for the up-to-date list.
Quick Example — Image
Quick Example — Video
Quick Example — Music
Quick Example — Transcribe
Endpoints at a Glance
| Endpoint | Method | Description |
|---|---|---|
/hub/v1/image | POST | Submit an image generation / edit task |
/hub/v1/video | POST | Submit a video generation / edit task |
/hub/v1/audio | POST | Submit a music generation task |
/hub/v1/transcribe | POST | Submit a transcription / translation task |
/hub/v1/tasks/{task_id} | GET | Poll task status & result |
/hub/v1/models | GET | List every model (optionally filtered by capability) |
/hub/v1/models/{model} | GET | Get a single model’s full input schema + example |
Task Status
| Status | Meaning |
|---|---|
pending | Queued, waiting for a worker |
processing | Actively running |
completed | ✅ Done — data contains the result |
failed | ❌ Failed — see error_code / error_message |
timeout | Exceeded the processing time limit |
cancelled | Cancelled by user or system |
Authentication
All endpoints require Bearer token authentication:Base URL
Explore the API Documentation
- Submit Image Task — Nano Banana, GPT Image 2 (+ edit variants)
- Submit Video Task — Veo 3.1, Kling v3, WAN 2.7, Seedance 2.0
- Submit Audio Task — ElevenLabs Music
- Submit Transcribe Task — Whisper / Wizper
- Poll Task Result — Get task status & result
- List Models — Browse all available models
- Get Model Details — Inspect input schema + example