Create a personalized Voice Persona through voice verification. The process requires two steps: init (upload voice + get verification phrase) → create (upload verification recording + create persona).
Workflow
User's voice audio
│
▼
① voicePersona/init
│ Upload voice → Extract vocals → Return verification phrase
│
│ Returns: { taskId }
│ Poll: GET /suno/v2/status?taskId=xxx
│ Result: vox_audio_id, voice_recording_id, phrase_id,
│ phrase_text, vocal_start_s, vocal_end_s
│
▼
User reads phrase_text aloud and records it
│
▼
② voicePersona/create
│ Upload verification recording → Voice verification → Create Persona
│
│ Returns: { taskId }
│ Poll: GET /suno/v2/status?taskId=xxx
│ Result: persona details
▼
Done → Use persona in /generate
Step 1: Init — Upload Voice & Get Verification Phrase
Upload the user’s voice audio. The system extracts vocals and returns a verification phrase that the user must read aloud.
Request
POST /suno/v2/voicePersona/init
| Field | Type | Required | Description |
|---|
voice_audio_url | string (URL) | Yes | Publicly downloadable URL of the voice audio (WAV/MP3) |
language | string | Yes | Verification phrase language: zh en ja ko es fr de pt ru hi |
vocal_start_s | number | No | Vocal extraction start time (seconds), default: 0 |
vocal_end_s | number | No | Vocal extraction end time (seconds), default: auto-detected |
Task Result Fields
When the task succeeds, result contains the following fields needed for Step 2:
| Field | Description |
|---|
vox_audio_id | Extracted vocal audio ID |
voice_recording_id | Recording ID |
phrase_id | Verification phrase ID |
phrase_text | Verification phrase text (user must read this aloud and record) |
vocal_start_s | Vocal start time (seconds) |
vocal_end_s | Vocal end time (seconds) |
See Init API Reference →
Step 2: Create — Upload Verification Recording & Create Persona
After the user reads phrase_text aloud and records it, upload the verification recording to complete voice verification and create the persona.
This is an async task. Poll Get Task Status with the returned taskId. The result contains the created persona details.
Request
POST /suno/v2/voicePersona/create
| Field | Type | Required | Description |
|---|
vox_audio_id | string | Yes | From init result |
voice_recording_id | string | Yes | From init result |
phrase_id | string | Yes | From init result |
verification_audio_url | string (URL) | Yes | User’s verification recording URL (WAV/MP3) |
vocal_start_s | number | Yes | From init result |
vocal_end_s | number | Yes | From init result |
name | string | Yes | Persona name |
description | string | No | Persona description |
is_public | boolean | No | Whether public (default: false) |
image_s3_id | string | No | Cover image (base64), auto-generated if not provided |
See Create API Reference →
Complete Example
const API_BASE = 'https://api.mountsea.ai';
const headers = {
'Content-Type': 'application/json',
'Authorization': 'Bearer your-api-key'
};
async function pollTask(taskId) {
while (true) {
const res = await fetch(`${API_BASE}/suno/v2/status?taskId=${taskId}`, { headers });
const task = await res.json();
if (task.status === 'success') return task.data;
if (task.status === 'failed') throw new Error(task.failReason);
await new Promise(r => setTimeout(r, 3000));
}
}
// Step 1: Init — upload voice and get verification phrase
const initRes = await fetch(`${API_BASE}/suno/v2/voicePersona/init`, {
method: 'POST',
headers,
body: JSON.stringify({
voice_audio_url: 'https://example.com/my-voice.wav',
language: 'zh'
})
});
const { taskId: initTaskId } = await initRes.json();
const initResult = await pollTask(initTaskId);
console.log('Please read aloud:', initResult.phrase_text);
// → User records themselves reading the phrase
// Step 2: Create — upload verification recording
const createRes = await fetch(`${API_BASE}/suno/v2/voicePersona/create`, {
method: 'POST',
headers,
body: JSON.stringify({
vox_audio_id: initResult.vox_audio_id,
voice_recording_id: initResult.voice_recording_id,
phrase_id: initResult.phrase_id,
verification_audio_url: 'https://example.com/verification.wav',
vocal_start_s: initResult.vocal_start_s,
vocal_end_s: initResult.vocal_end_s,
name: 'My Voice'
})
});
const { taskId: createTaskId } = await createRes.json();
const persona = await pollTask(createTaskId);
console.log('Voice Persona created:', persona);
Important Notes
The verification recording must clearly contain the full phrase_text content. Incomplete or unclear recordings may cause voice verification to fail.
- Same account guarantee: The init and create steps automatically use the same Suno account — no manual account specification needed.
- Language selection:
language determines the verification phrase language. It’s recommended to match the language of the original voice audio.
- Processing time: Init takes ~20-60s (includes vocal extraction); Create takes ~10-30s (includes voice verification).
- Using the persona: Once created, use the persona in the Generate endpoint via the
persona parameter to create music with consistent vocal characteristics.