Skip to main content
POST
/
suno
/
v2
/
getVoxStem
Get Vox Stem
curl --request POST \
  --url https://api.mountsea.ai/suno/v2/getVoxStem \
  --header 'Content-Type: application/json' \
  --data '
{
  "clip_id": "<string>"
}
'
{
  "id": "<string>",
  "status": "<string>",
  "source_clip_id": "<string>",
  "vocal_start_s": 123,
  "vocal_end_s": 123,
  "vocal_audio_url": "<string>"
}
Extract the vocal stem from an audio clip. The returned id is the vox_audio_id for Create Persona — the first step when building a Vox persona for use with task=inspiration and persona_style: "vox".
This endpoint returns the result directly (not a task ID). The extraction is processed synchronously.
By default the service extracts roughly 45–74 seconds of vocals. If the song is shorter than 74s, or you need a different range, coordinate with your integration for custom vocal_start_s / vocal_end_s handling where supported.

Request Body

clip_id
string
required
The audio clip ID to extract vocals from.

Response

id
string
The vox audio ID. Use this as vox_audio_id when creating a persona.
status
string
The extraction status (e.g., "complete").
source_clip_id
string
The original clip ID that was processed.
vocal_start_s
number
Detected vocal start time in seconds.
vocal_end_s
number
Detected vocal end time in seconds.
vocal_audio_url
string
URL to the extracted vocal audio file.

Example

curl -X POST https://api.mountsea.ai/suno/v2/getVoxStem \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "clip_id": "78d99ca1-f751-4188-8b8e-0784754f0d8e"
  }'

Response Example

{
  "id": "373efa9c-a366-42bb-806c-afdfc9b306a7",
  "status": "complete",
  "source_clip_id": "78d99ca1-f751-4188-8b8e-0784754f0d8e",
  "vocal_start_s": 45.0,
  "vocal_end_s": 74.0,
  "vocal_audio_url": "https://cdn1.suno.ai/processed_373efa9c-a366-42bb-806c-afdfc9b306a7_vocals.m4a"
}

Workflow: Creating a Vox Persona

  1. Get Vox Stem — extract vocals; save returned id as vox_audio_id
  2. Create Persona — pass root_clip_id as clip_id, plus vocal range on the root clip
// Step 1: Get vox stem
const voxData = await fetch('https://api.mountsea.ai/suno/v2/getVoxStem', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json', 'Authorization': 'Bearer your-api-key' },
  body: JSON.stringify({ clip_id: '4fa20262-2126-4ec2-9846-e668211a8c7b' })
}).then(r => r.json());

// Step 2: Create persona
await fetch('https://api.mountsea.ai/suno/v2/persona', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json', 'Authorization': 'Bearer your-api-key' },
  body: JSON.stringify({
    clip_id: '4fa20262-2126-4ec2-9846-e668211a8c7b',  // root_clip_id (Studio export)
    vox_audio_id: voxData.id,
    vocal_start_s: 0,
    vocal_end_s: 120,  // root clip duration — NOT the getVoxStem slice bounds
    user_input_styles: 'rap, catchy',
    name: 'My Voice',
    is_public: true
  })
});
For persona creation, vocal_start_s / vocal_end_s define the valid vocal range on the root clip (often 0 through the clip’s full length). They are not the same as the internal 45–74s slice used by getVoxStem.
Use the persona in Inspiration generation with persona_style: "vox" and artist_clip_id set to the persona’s root_clip_id.