Extract the vocal stem from an audio clip. The returned id can be used as the vox_audio_id when creating a persona with the Create Persona endpoint.
This endpoint returns the result directly (not a task ID). The extraction is processed synchronously.
Request Body
The audio clip ID to extract vocals from.
Response
The vox audio ID. Use this as vox_audio_id when creating a persona.
The extraction status (e.g., "complete").
The original clip ID that was processed.
Detected vocal start time in seconds.
Detected vocal end time in seconds.
URL to the extracted vocal audio file.
Example
curl -X POST https://api.mountsea.ai/suno/v2/getVoxStem \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"clip_id": "78d99ca1-f751-4188-8b8e-0784754f0d8e"
}'
Response Example
{
"id": "373efa9c-a366-42bb-806c-afdfc9b306a7",
"status": "complete",
"source_clip_id": "78d99ca1-f751-4188-8b8e-0784754f0d8e",
"vocal_start_s": 45.0,
"vocal_end_s": 74.0,
"vocal_audio_url": "https://cdn1.suno.ai/processed_373efa9c-a366-42bb-806c-afdfc9b306a7_vocals.m4a"
}
Workflow: Creating a Vocal Persona
- Get Vox Stem - Call this endpoint to extract vocals and get the
vox_audio_id
- Create Persona - Use the returned ID to create a persona
// Step 1: Get vox stem
const voxResponse = await fetch('https://api.mountsea.ai/suno/v2/getVoxStem', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer your-api-key'
},
body: JSON.stringify({ clip_id: 'your-clip-id' })
});
const voxData = await voxResponse.json();
// Step 2: Create persona using the vox_audio_id
const personaResponse = await fetch('https://api.mountsea.ai/suno/v2/persona', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer your-api-key'
},
body: JSON.stringify({
clip_id: 'your-clip-id',
name: 'My Vocal Persona',
is_public: true,
persona_type: 'vox',
vox_audio_id: voxData.id, // Use the ID from getVoxStem
vocal_start_s: voxData.vocal_start_s,
vocal_end_s: voxData.vocal_end_s
})
});
The vocal_start_s and vocal_end_s returned by this endpoint can be directly used when creating a persona to ensure optimal vocal extraction.