API Reference
Text to Speech API
Generate MP3 speech from text, SRT cue text, or timestamped transcript segments.
Credits are checked before the job starts. For current costs, see API credit costs.
OpenAPI-style reference
API endpoints
Generate speech from text, SRT, transcript segments, or cloned reference audio.
/api/v1are auto-cleaned and do not count toward the user's storage quota./api/v1/text-to-speech/uploadsCreate reference audio upload URL
Creates a signed upload URL for voice clone reference audio.
Parameters
| Field | Type | Required | Details |
|---|---|---|---|
uploadType | reference_audio | No | Upload type |
fileName | string | Yes | Reference audio file name body |
contentType | string | No | Audio content type body |
fileSize | number | Yes | Declared size in bytes, max 100MB body |
Examples
Request
{
"fileName": "voice.wav",
"contentType": "audio/wav",
"fileSize": 5242880
}Response
{
"uploadUrl": "https://...",
"referenceAudioObjectKey": "user_.../text-to-speech-api/reference-audio/abc123-voice.wav",
"expiresIn": 900
}Responses
| Status | Description |
|---|---|
200 | Request succeeded |
400 | Invalid request body or unsupported parameter |
401 | Missing, invalid, or revoked API key |
429 | Rate limit exceeded |
500 | Unexpected processing error |
/api/v1/text-to-speech/jobsStart text to speech generation
Generates MP3 audio from text, SRT, or transcript segments. Voice clone requires reference audio.
Parameters
| Field | Type | Required | Details |
|---|---|---|---|
input.type | text | srt | transcript | Yes | Input format body |
input.text | string | No | Required when input.type is text bodyText input |
input.content | string | No | Required when input.type is srt bodySRT input |
input.segments | array | No | Required when input.type is transcript bodyTranscript input |
voiceId | string | No | Required unless cloneVoice is true bodyVoice IDs |
cloneVoice | boolean | No | Clone reference audio before synthesis |
referenceAudioObjectKey | string | No | Required when cloneVoice is true unless referenceAudioUrl is provided bodyReference upload |
referenceAudioUrl | string | No | Public reference audio URL for cloneVoice bodyVoice clone |
referenceText | string | No | Optional reference script for cloneVoice bodyVoice clone |
cloneVoiceName | string | No | Optional display name for the temporary cloned voice bodyVoice clone |
removeBackgroundNoiseForClone | boolean | No | Clean reference audio before cloning |
languageCode | string | No | Optional voice language filter when resolving voiceId |
languageName | string | No | Optional display language name |
targetDurationSeconds | number | No | If generated audio is longer, Subclip speeds it up with ffmpeg bodyTTS options |
speed | number | No | 0.25 to 2 |
outputFormat | mp3 | No | Audio output format |
projectName | string | No | Optional internal project label body |
Examples
Request
{
"input": {
"type": "text",
"text": "Welcome back. Today we are turning a long idea into a clean voiceover."
},
"voiceId": "svoice_...",
"speed": 1,
"targetDurationSeconds": 30,
"outputFormat": "mp3"
}Response
{
"generationId": "atts_...",
"status": "queued",
"runId": "run_...",
"statusUrl": "/api/v1/text-to-speech/jobs/atts_..."
}Responses
| Status | Description |
|---|---|
200 | Request succeeded |
400 | Invalid request body or unsupported parameter |
401 | Missing, invalid, or revoked API key |
429 | Rate limit exceeded |
500 | Unexpected processing error |
/api/v1/text-to-speech/jobs/{generationId}Get text to speech status
Returns generation progress and output metadata.
Parameters
| Field | Type | Required | Details |
|---|---|---|---|
generationId | string | Yes | Text to speech generation ID path |
Examples
Response
{
"generationId": "atts_...",
"status": "queued | processing | completed | failed",
"progress": 100,
"outputReady": true,
"creditsUsed": 9,
"errorMessage": null,
"updatedAt": "2026-06-19T14:20:00.000Z"
}Responses
| Status | Description |
|---|---|
200 | Request succeeded |
400 | Invalid request body or unsupported parameter |
401 | Missing, invalid, or revoked API key |
429 | Rate limit exceeded |
500 | Unexpected processing error |
/api/v1/text-to-speech/jobs/{generationId}/downloadCreate text to speech download URL
Returns a signed MP3 download URL.
Parameters
| Field | Type | Required | Details |
|---|---|---|---|
generationId | string | Yes | Text to speech generation ID path |
Examples
Response
{
"generationId": "atts_...",
"downloadUrl": "https://signed-download-url...",
"expiresAt": "2026-06-19T15:00:00.000Z",
"expiresIn": 3600,
"contentType": "video/mp4",
"fileSize": 18345678
}Responses
| Status | Description |
|---|---|
200 | Request succeeded |
400 | Invalid request body or unsupported parameter |
401 | Missing, invalid, or revoked API key |
429 | Rate limit exceeded |
500 | Unexpected processing error |
Voices
Use the shared voice endpoint and pass the returned voiceoverVoiceId as voiceId. For voice clone jobs, set cloneVoice to true instead of passing voiceId.
curl "https://www.subclip.app/api/v1/media-render/voices?language=en" \ -H "Authorization: Bearer $SUBCLIP_API_KEY"
Upload reference audio
Optional. This is useful if you do not want to manage S3 storage infrastructure and want Subclip to handle direct uploads from your client application.
curl -X POST https://www.subclip.app/api/v1/text-to-speech/uploads \
-H "Authorization: Bearer $SUBCLIP_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"uploadType": "reference_audio",
"fileName": "reference-voice.wav",
"contentType": "audio/wav",
"fileSize": 8342210
}'curl -X PUT "$UPLOAD_URL" \ -H "Content-Type: audio/wav" \ --upload-file ./reference-voice.wav
Supported languages
The voiceId already identifies the selected voice language. Use languageCode only when you want to filter voice lookup by language.
| Code | Language |
|---|---|
| en | English (US) |
| zh | Chinese (Simplified) |
| ko | Korean |
| ja | Japanese |
| ru | Russian |
| it | Italian |
| es | Spanish |
| pt | Portuguese (Brazil) |
| de | German |
| fr | French |
| ar | Arabic (Saudi) |
| pl | Polish |
| nl | Dutch |
| hi | Hindi |
| he | Hebrew |
Start from text
Use targetDurationSeconds when the generated speech needs to fit a target slot. If the generated audio is longer, Subclip speeds it up with FFmpeg.
curl -X POST https://www.subclip.app/api/v1/text-to-speech/jobs \
-H "Authorization: Bearer $SUBCLIP_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": {
"type": "text",
"text": "Welcome back. Today we are turning a long idea into a clean voiceover."
},
"voiceId": "svoice_..."
}'{
"generationId": "atts_...",
"status": "queued",
"runId": "run_...",
"estimatedCredits": 3,
"estimatedDurationSeconds": 30,
"unconstrainedEstimatedDurationSeconds": 34,
"targetDurationSeconds": 30,
"statusUrl": "/api/v1/text-to-speech/jobs/atts_...",
"downloadUrl": "/api/v1/text-to-speech/jobs/atts_.../download"
}{
"languageCode": "en",
"speed": 1.1,
"targetDurationSeconds": 30,
"outputFormat": "mp3"
}Start with voice clone
When cloneVoice is true, pass exactly one of referenceAudioObjectKey or referenceAudioUrl. Subclip creates a temporary cloned voice, generates the audio, then deletes the cloned voice from Inworld and removes the temporary DB voice row.
curl -X POST https://www.subclip.app/api/v1/text-to-speech/jobs \
-H "Authorization: Bearer $SUBCLIP_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": {
"type": "text",
"text": "Welcome back. This narration uses a temporary cloned voice."
},
"cloneVoice": true,
"referenceAudioObjectKey": "user_.../text-to-speech-api/reference-audio/...",
"referenceText": "Optional transcript for the reference voice sample.",
"cloneVoiceName": "Launch narrator",
"languageCode": "en"
}'{
"input": {
"type": "text",
"text": "This job clones from a public reference audio URL."
},
"cloneVoice": true,
"referenceAudioUrl": "https://example.com/reference-voice.wav",
"languageCode": "en"
}Start from SRT
Pass SRT content inline. Subclip extracts the cue text in timestamp order.
curl -X POST https://www.subclip.app/api/v1/text-to-speech/jobs \
-H "Authorization: Bearer $SUBCLIP_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": {
"type": "srt",
"content": "1\n00:00:00,000 --> 00:00:02,400\nStart with the strongest hook.\n\n2\n00:00:02,400 --> 00:00:05,000\nThen explain the payoff clearly."
},
"voiceId": "svoice_..."
}'Start from transcript segments
curl -X POST https://www.subclip.app/api/v1/text-to-speech/jobs \
-H "Authorization: Bearer $SUBCLIP_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": {
"type": "transcript",
"segments": [
{ "start": 0, "end": 2.4, "text": "Start with the strongest hook." },
{ "start": 2.4, "end": 5, "text": "Then explain the payoff clearly." }
]
},
"voiceId": "svoice_..."
}'Options
| Field | Required | Notes |
|---|---|---|
| input.type | Yes | text, srt, or transcript |
| input.text | Yes for text | Plain text up to 10,000 characters |
| input.content | Yes for srt | SRT content, Subclip reads cue text in timeline order |
| input.segments | Yes for transcript | Array of start, end, and text segments |
| voiceId | Conditional | Required when cloneVoice is false, use a voice returned by GET /api/v1/media-render/voices |
| cloneVoice | No | Set true to create a temporary cloned voice from reference audio |
| referenceAudioObjectKey | Conditional | Required for cloneVoice unless referenceAudioUrl is provided |
| referenceAudioUrl | Conditional | Public downloadable audio URL for cloneVoice when not using Subclip upload |
| referenceText | No | Optional transcript for the reference voice sample |
| cloneVoiceName | No | Optional display name for the temporary cloned voice |
| removeBackgroundNoiseForClone | No | Defaults to false |
| languageCode | No | Optional voice language filter when resolving voiceId |
| speed | No | 0.25 to 2, defaults to 1 |
| targetDurationSeconds | No | Speeds up the generated audio with FFmpeg when the output is longer than this target |
| outputFormat | No | mp3 |
Poll status
curl https://www.subclip.app/api/v1/text-to-speech/jobs/atts_... \ -H "Authorization: Bearer $SUBCLIP_API_KEY"
Download
DOWNLOAD_JSON=$(curl -s https://www.subclip.app/api/v1/text-to-speech/jobs/atts_.../download \ -H "Authorization: Bearer $SUBCLIP_API_KEY") DOWNLOAD_URL=$(echo "$DOWNLOAD_JSON" | jq -r '.downloadUrl') curl -L "$DOWNLOAD_URL" -o speech.mp3