Subclip Logo

API Reference

Text to Speech API

Generate MP3 speech from text, SRT cue text, or timestamped transcript segments.

Credits are checked before the job starts. For current costs, see API credit costs.

OpenAPI-style reference

API endpoints

Generate speech from text, SRT, transcript segments, or cloned reference audio.

API v1 storage handling: files uploaded or generated through /api/v1are auto-cleaned and do not count toward the user's storage quota.
POST/api/v1/text-to-speech/uploads

Create reference audio upload URL

Creates a signed upload URL for voice clone reference audio.

Bearer auth

Parameters

FieldTypeRequiredDetails
uploadTypereference_audioNo
Upload type
bodydefault: reference_audioReference upload
fileNamestringYes
Reference audio file name
body
contentTypestringNo
Audio content type
body
fileSizenumberYes
Declared size in bytes, max 100MB
body

Examples

Request

{
  "fileName": "voice.wav",
  "contentType": "audio/wav",
  "fileSize": 5242880
}

Response

{
  "uploadUrl": "https://...",
  "referenceAudioObjectKey": "user_.../text-to-speech-api/reference-audio/abc123-voice.wav",
  "expiresIn": 900
}

Responses

StatusDescription
200Request succeeded
400Invalid request body or unsupported parameter
401Missing, invalid, or revoked API key
429Rate limit exceeded
500Unexpected processing error
POST/api/v1/text-to-speech/jobs

Start text to speech generation

Generates MP3 audio from text, SRT, or transcript segments. Voice clone requires reference audio.

Bearer auth

Parameters

FieldTypeRequiredDetails
input.typetext | srt | transcriptYes
Input format
body
input.textstringNo
Required when input.type is text
input.contentstringNo
Required when input.type is srt
input.segmentsarrayNo
Required when input.type is transcript
voiceIdstringNo
Required unless cloneVoice is true
cloneVoicebooleanNo
Clone reference audio before synthesis
bodydefault: falseVoice clone
referenceAudioObjectKeystringNo
Required when cloneVoice is true unless referenceAudioUrl is provided
referenceAudioUrlstringNo
Public reference audio URL for cloneVoice
referenceTextstringNo
Optional reference script for cloneVoice
cloneVoiceNamestringNo
Optional display name for the temporary cloned voice
removeBackgroundNoiseForClonebooleanNo
Clean reference audio before cloning
bodydefault: falseVoice clone
languageCodestringNo
Optional voice language filter when resolving voiceId
languageNamestringNo
Optional display language name
targetDurationSecondsnumberNo
If generated audio is longer, Subclip speeds it up with ffmpeg
speednumberNo
0.25 to 2
bodydefault: 1TTS options
outputFormatmp3No
Audio output format
bodydefault: mp3TTS options
projectNamestringNo
Optional internal project label
body

Examples

Request

{
  "input": {
    "type": "text",
    "text": "Welcome back. Today we are turning a long idea into a clean voiceover."
  },
  "voiceId": "svoice_...",
  "speed": 1,
  "targetDurationSeconds": 30,
  "outputFormat": "mp3"
}

Response

{
  "generationId": "atts_...",
  "status": "queued",
  "runId": "run_...",
  "statusUrl": "/api/v1/text-to-speech/jobs/atts_..."
}

Responses

StatusDescription
200Request succeeded
400Invalid request body or unsupported parameter
401Missing, invalid, or revoked API key
429Rate limit exceeded
500Unexpected processing error
GET/api/v1/text-to-speech/jobs/{generationId}

Get text to speech status

Returns generation progress and output metadata.

Bearer auth

Parameters

FieldTypeRequiredDetails
generationIdstringYes
Text to speech generation ID
path

Examples

Response

{
  "generationId": "atts_...",
  "status": "queued | processing | completed | failed",
  "progress": 100,
  "outputReady": true,
  "creditsUsed": 9,
  "errorMessage": null,
  "updatedAt": "2026-06-19T14:20:00.000Z"
}

Responses

StatusDescription
200Request succeeded
400Invalid request body or unsupported parameter
401Missing, invalid, or revoked API key
429Rate limit exceeded
500Unexpected processing error
GET/api/v1/text-to-speech/jobs/{generationId}/download

Create text to speech download URL

Returns a signed MP3 download URL.

Bearer auth

Parameters

FieldTypeRequiredDetails
generationIdstringYes
Text to speech generation ID
path

Examples

Response

{
  "generationId": "atts_...",
  "downloadUrl": "https://signed-download-url...",
  "expiresAt": "2026-06-19T15:00:00.000Z",
  "expiresIn": 3600,
  "contentType": "video/mp4",
  "fileSize": 18345678
}

Responses

StatusDescription
200Request succeeded
400Invalid request body or unsupported parameter
401Missing, invalid, or revoked API key
429Rate limit exceeded
500Unexpected processing error

Voices

Use the shared voice endpoint and pass the returned voiceoverVoiceId as voiceId. For voice clone jobs, set cloneVoice to true instead of passing voiceId.

curl "https://www.subclip.app/api/v1/media-render/voices?language=en" \
  -H "Authorization: Bearer $SUBCLIP_API_KEY"

Upload reference audio

Optional. This is useful if you do not want to manage S3 storage infrastructure and want Subclip to handle direct uploads from your client application.

curl -X POST https://www.subclip.app/api/v1/text-to-speech/uploads \
  -H "Authorization: Bearer $SUBCLIP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "uploadType": "reference_audio",
    "fileName": "reference-voice.wav",
    "contentType": "audio/wav",
    "fileSize": 8342210
  }'
curl -X PUT "$UPLOAD_URL" \
  -H "Content-Type: audio/wav" \
  --upload-file ./reference-voice.wav

Supported languages

The voiceId already identifies the selected voice language. Use languageCode only when you want to filter voice lookup by language.

CodeLanguage
enEnglish (US)
zhChinese (Simplified)
koKorean
jaJapanese
ruRussian
itItalian
esSpanish
ptPortuguese (Brazil)
deGerman
frFrench
arArabic (Saudi)
plPolish
nlDutch
hiHindi
heHebrew

Start from text

Use targetDurationSeconds when the generated speech needs to fit a target slot. If the generated audio is longer, Subclip speeds it up with FFmpeg.

curl -X POST https://www.subclip.app/api/v1/text-to-speech/jobs \
  -H "Authorization: Bearer $SUBCLIP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "type": "text",
      "text": "Welcome back. Today we are turning a long idea into a clean voiceover."
    },
    "voiceId": "svoice_..."
  }'
{
  "generationId": "atts_...",
  "status": "queued",
  "runId": "run_...",
  "estimatedCredits": 3,
  "estimatedDurationSeconds": 30,
  "unconstrainedEstimatedDurationSeconds": 34,
  "targetDurationSeconds": 30,
  "statusUrl": "/api/v1/text-to-speech/jobs/atts_...",
  "downloadUrl": "/api/v1/text-to-speech/jobs/atts_.../download"
}
{
  "languageCode": "en",
  "speed": 1.1,
  "targetDurationSeconds": 30,
  "outputFormat": "mp3"
}

Start with voice clone

When cloneVoice is true, pass exactly one of referenceAudioObjectKey or referenceAudioUrl. Subclip creates a temporary cloned voice, generates the audio, then deletes the cloned voice from Inworld and removes the temporary DB voice row.

curl -X POST https://www.subclip.app/api/v1/text-to-speech/jobs \
  -H "Authorization: Bearer $SUBCLIP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "type": "text",
      "text": "Welcome back. This narration uses a temporary cloned voice."
    },
    "cloneVoice": true,
    "referenceAudioObjectKey": "user_.../text-to-speech-api/reference-audio/...",
    "referenceText": "Optional transcript for the reference voice sample.",
    "cloneVoiceName": "Launch narrator",
    "languageCode": "en"
  }'
{
  "input": {
    "type": "text",
    "text": "This job clones from a public reference audio URL."
  },
  "cloneVoice": true,
  "referenceAudioUrl": "https://example.com/reference-voice.wav",
  "languageCode": "en"
}

Start from SRT

Pass SRT content inline. Subclip extracts the cue text in timestamp order.

curl -X POST https://www.subclip.app/api/v1/text-to-speech/jobs \
  -H "Authorization: Bearer $SUBCLIP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "type": "srt",
      "content": "1\n00:00:00,000 --> 00:00:02,400\nStart with the strongest hook.\n\n2\n00:00:02,400 --> 00:00:05,000\nThen explain the payoff clearly."
    },
    "voiceId": "svoice_..."
  }'

Start from transcript segments

curl -X POST https://www.subclip.app/api/v1/text-to-speech/jobs \
  -H "Authorization: Bearer $SUBCLIP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "type": "transcript",
      "segments": [
        { "start": 0, "end": 2.4, "text": "Start with the strongest hook." },
        { "start": 2.4, "end": 5, "text": "Then explain the payoff clearly." }
      ]
    },
    "voiceId": "svoice_..."
  }'

Options

FieldRequiredNotes
input.typeYestext, srt, or transcript
input.textYes for textPlain text up to 10,000 characters
input.contentYes for srtSRT content, Subclip reads cue text in timeline order
input.segmentsYes for transcriptArray of start, end, and text segments
voiceIdConditionalRequired when cloneVoice is false, use a voice returned by GET /api/v1/media-render/voices
cloneVoiceNoSet true to create a temporary cloned voice from reference audio
referenceAudioObjectKeyConditionalRequired for cloneVoice unless referenceAudioUrl is provided
referenceAudioUrlConditionalPublic downloadable audio URL for cloneVoice when not using Subclip upload
referenceTextNoOptional transcript for the reference voice sample
cloneVoiceNameNoOptional display name for the temporary cloned voice
removeBackgroundNoiseForCloneNoDefaults to false
languageCodeNoOptional voice language filter when resolving voiceId
speedNo0.25 to 2, defaults to 1
targetDurationSecondsNoSpeeds up the generated audio with FFmpeg when the output is longer than this target
outputFormatNomp3

Poll status

curl https://www.subclip.app/api/v1/text-to-speech/jobs/atts_... \
  -H "Authorization: Bearer $SUBCLIP_API_KEY"

Download

DOWNLOAD_JSON=$(curl -s https://www.subclip.app/api/v1/text-to-speech/jobs/atts_.../download \
  -H "Authorization: Bearer $SUBCLIP_API_KEY")

DOWNLOAD_URL=$(echo "$DOWNLOAD_JSON" | jq -r '.downloadUrl')
curl -L "$DOWNLOAD_URL" -o speech.mp3