API Reference

Text to Speech API

Generate MP3 speech from text, SRT cue text, or timestamped transcript segments.

Credits are checked before the job starts. For current costs, see API credit costs.

OpenAPI-style reference

API endpoints

Generate speech from text, SRT, transcript segments, or cloned reference audio.

API v1 storage handling: files uploaded or generated through /api/v1are auto-cleaned and do not count toward the user's storage quota.

POST/api/v1/text-to-speech/uploads

Create reference audio upload URL

Creates a signed upload URL for voice clone reference audio.

Bearer auth

Parameters

Field	Type	Required	Details
`uploadType`	`reference_audio`	No	Upload type bodydefault: reference_audioReference upload
`fileName`	`string`	Yes	Reference audio file name body
`contentType`	`string`	No	Audio content type body
`fileSize`	`number`	Yes	Declared size in bytes, max 100MB body

Examples

Request

{
  "fileName": "voice.wav",
  "contentType": "audio/wav",
  "fileSize": 5242880
}

Response

{
  "uploadUrl": "https://...",
  "referenceAudioObjectKey": "user_.../text-to-speech-api/reference-audio/abc123-voice.wav",
  "expiresIn": 900
}

Responses

Status	Description
`200`	Request succeeded
`400`	Invalid request body or unsupported parameter
`401`	Missing, invalid, or revoked API key
`429`	Rate limit exceeded
`500`	Unexpected processing error

POST/api/v1/text-to-speech/jobs

Start text to speech generation

Generates MP3 audio from text, SRT, or transcript segments. Voice clone requires reference audio.

Bearer auth

Parameters

Field	Type	Required	Details
`input.type`	`text \| srt \| transcript`	Yes	Input format body
`input.text`	`string`	No	Required when input.type is text bodyText input
`input.content`	`string`	No	Required when input.type is srt bodySRT input
`input.segments`	`array`	No	Required when input.type is transcript bodyTranscript input
`voiceId`	`string`	No	Required unless cloneVoice is true bodyVoice IDs
`cloneVoice`	`boolean`	No	Clone reference audio before synthesis bodydefault: falseVoice clone
`referenceAudioObjectKey`	`string`	No	Required when cloneVoice is true unless referenceAudioUrl is provided bodyReference upload
`referenceAudioUrl`	`string`	No	Public reference audio URL for cloneVoice bodyVoice clone
`referenceText`	`string`	No	Optional reference script for cloneVoice bodyVoice clone
`cloneVoiceName`	`string`	No	Optional display name for the temporary cloned voice bodyVoice clone
`removeBackgroundNoiseForClone`	`boolean`	No	Clean reference audio before cloning bodydefault: falseVoice clone
`languageCode`	`string`	No	Optional voice language filter when resolving voiceId bodySupported TTS languages
`languageName`	`string`	No	Optional display language name bodySupported TTS languages
`targetDurationSeconds`	`number`	No	If generated audio is longer, Subclip speeds it up with ffmpeg bodyTTS options
`speed`	`number`	No	0.25 to 2 bodydefault: 1TTS options
`outputFormat`	`mp3`	No	Audio output format bodydefault: mp3TTS options
`projectName`	`string`	No	Optional internal project label body

Examples

Request

{
  "input": {
    "type": "text",
    "text": "Welcome back. Today we are turning a long idea into a clean voiceover."
  },
  "voiceId": "svoice_...",
  "speed": 1,
  "targetDurationSeconds": 30,
  "outputFormat": "mp3"
}

Response

{
  "generationId": "atts_...",
  "status": "queued",
  "runId": "run_...",
  "statusUrl": "/api/v1/text-to-speech/jobs/atts_..."
}

Responses

Status	Description
`200`	Request succeeded
`400`	Invalid request body or unsupported parameter
`401`	Missing, invalid, or revoked API key
`429`	Rate limit exceeded
`500`	Unexpected processing error

GET/api/v1/text-to-speech/jobs/{generationId}

Get text to speech status

Returns generation progress and output metadata.

Bearer auth

Parameters

Field	Type	Required	Details
`generationId`	`string`	Yes	Text to speech generation ID path

Examples

Response

{
  "generationId": "atts_...",
  "status": "queued | processing | completed | failed",
  "progress": 100,
  "outputReady": true,
  "creditsUsed": 9,
  "errorMessage": null,
  "updatedAt": "2026-06-19T14:20:00.000Z"
}

Responses

Status	Description
`200`	Request succeeded
`400`	Invalid request body or unsupported parameter
`401`	Missing, invalid, or revoked API key
`429`	Rate limit exceeded
`500`	Unexpected processing error

GET/api/v1/text-to-speech/jobs/{generationId}/download

Create text to speech download URL

Returns a signed MP3 download URL.

Bearer auth

Parameters

Field	Type	Required	Details
`generationId`	`string`	Yes	Text to speech generation ID path

Examples

Response

{
  "generationId": "atts_...",
  "downloadUrl": "https://signed-download-url...",
  "expiresAt": "2026-06-19T15:00:00.000Z",
  "expiresIn": 3600,
  "contentType": "video/mp4",
  "fileSize": 18345678
}

Responses

Status	Description
`200`	Request succeeded
`400`	Invalid request body or unsupported parameter
`401`	Missing, invalid, or revoked API key
`429`	Rate limit exceeded
`500`	Unexpected processing error

Voices

Use the shared voice endpoint and pass the returned voiceoverVoiceId as voiceId. For voice clone jobs, set cloneVoice to true instead of passing voiceId.

curl "https://www.subclip.app/api/v1/media-render/voices?language=en" \
  -H "Authorization: Bearer $SUBCLIP_API_KEY"

Upload reference audio

Optional. This is useful if you do not want to manage S3 storage infrastructure and want Subclip to handle direct uploads from your client application.

curl -X POST https://www.subclip.app/api/v1/text-to-speech/uploads \
  -H "Authorization: Bearer $SUBCLIP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "uploadType": "reference_audio",
    "fileName": "reference-voice.wav",
    "contentType": "audio/wav",
    "fileSize": 8342210
  }'

curl -X PUT "$UPLOAD_URL" \
  -H "Content-Type: audio/wav" \
  --upload-file ./reference-voice.wav

Supported languages

The voiceId already identifies the selected voice language. Use languageCode only when you want to filter voice lookup by language.

Code	Language
en	English (US)
zh	Chinese (Simplified)
ko	Korean
ja	Japanese
ru	Russian
it	Italian
es	Spanish
pt	Portuguese (Brazil)
de	German
fr	French
ar	Arabic (Saudi)
pl	Polish
nl	Dutch
hi	Hindi
he	Hebrew

Start from text

Use targetDurationSeconds when the generated speech needs to fit a target slot. If the generated audio is longer, Subclip speeds it up with FFmpeg.

curl -X POST https://www.subclip.app/api/v1/text-to-speech/jobs \
  -H "Authorization: Bearer $SUBCLIP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "type": "text",
      "text": "Welcome back. Today we are turning a long idea into a clean voiceover."
    },
    "voiceId": "svoice_..."
  }'

{
  "generationId": "atts_...",
  "status": "queued",
  "runId": "run_...",
  "estimatedCredits": 3,
  "estimatedDurationSeconds": 30,
  "unconstrainedEstimatedDurationSeconds": 34,
  "targetDurationSeconds": 30,
  "statusUrl": "/api/v1/text-to-speech/jobs/atts_...",
  "downloadUrl": "/api/v1/text-to-speech/jobs/atts_.../download"
}

{
  "languageCode": "en",
  "speed": 1.1,
  "targetDurationSeconds": 30,
  "outputFormat": "mp3"
}

Start with voice clone

When cloneVoice is true, pass exactly one of referenceAudioObjectKey or referenceAudioUrl. Subclip creates a temporary cloned voice, generates the audio, then deletes the cloned voice from Inworld and removes the temporary DB voice row.

curl -X POST https://www.subclip.app/api/v1/text-to-speech/jobs \
  -H "Authorization: Bearer $SUBCLIP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "type": "text",
      "text": "Welcome back. This narration uses a temporary cloned voice."
    },
    "cloneVoice": true,
    "referenceAudioObjectKey": "user_.../text-to-speech-api/reference-audio/...",
    "referenceText": "Optional transcript for the reference voice sample.",
    "cloneVoiceName": "Launch narrator",
    "languageCode": "en"
  }'

{
  "input": {
    "type": "text",
    "text": "This job clones from a public reference audio URL."
  },
  "cloneVoice": true,
  "referenceAudioUrl": "https://example.com/reference-voice.wav",
  "languageCode": "en"
}

Start from SRT

Pass SRT content inline. Subclip extracts the cue text in timestamp order.

curl -X POST https://www.subclip.app/api/v1/text-to-speech/jobs \
  -H "Authorization: Bearer $SUBCLIP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "type": "srt",
      "content": "1\n00:00:00,000 --> 00:00:02,400\nStart with the strongest hook.\n\n2\n00:00:02,400 --> 00:00:05,000\nThen explain the payoff clearly."
    },
    "voiceId": "svoice_..."
  }'

Start from transcript segments

curl -X POST https://www.subclip.app/api/v1/text-to-speech/jobs \
  -H "Authorization: Bearer $SUBCLIP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "type": "transcript",
      "segments": [
        { "start": 0, "end": 2.4, "text": "Start with the strongest hook." },
        { "start": 2.4, "end": 5, "text": "Then explain the payoff clearly." }
      ]
    },
    "voiceId": "svoice_..."
  }'

Options

Field	Required	Notes
input.type	Yes	text, srt, or transcript
input.text	Yes for text	Plain text up to 10,000 characters
input.content	Yes for srt	SRT content, Subclip reads cue text in timeline order
input.segments	Yes for transcript	Array of start, end, and text segments
voiceId	Conditional	Required when cloneVoice is false, use a voice returned by GET /api/v1/media-render/voices
cloneVoice	No	Set true to create a temporary cloned voice from reference audio
referenceAudioObjectKey	Conditional	Required for cloneVoice unless referenceAudioUrl is provided
referenceAudioUrl	Conditional	Public downloadable audio URL for cloneVoice when not using Subclip upload
referenceText	No	Optional transcript for the reference voice sample
cloneVoiceName	No	Optional display name for the temporary cloned voice
removeBackgroundNoiseForClone	No	Defaults to false
languageCode	No	Optional voice language filter when resolving voiceId
speed	No	0.25 to 2, defaults to 1
targetDurationSeconds	No	Speeds up the generated audio with FFmpeg when the output is longer than this target
outputFormat	No	mp3

Poll status

curl https://www.subclip.app/api/v1/text-to-speech/jobs/atts_... \
  -H "Authorization: Bearer $SUBCLIP_API_KEY"

Download

DOWNLOAD_JSON=$(curl -s https://www.subclip.app/api/v1/text-to-speech/jobs/atts_.../download \
  -H "Authorization: Bearer $SUBCLIP_API_KEY")

DOWNLOAD_URL=$(echo "$DOWNLOAD_JSON" | jq -r '.downloadUrl')
curl -L "$DOWNLOAD_URL" -o speech.mp3

Text to Speech API

API endpoints

Create reference audio upload URL

Parameters

Examples

Responses

Start text to speech generation

Parameters

Examples

Responses

Get text to speech status

Parameters

Examples

Responses

Create text to speech download URL

Parameters

Examples

Responses

Voices

Upload reference audio

Supported languages

Start from text

Start with voice clone

Start from SRT

Start from transcript segments

Options

Poll status

Download

Tutorials & Legal

API

Compare

FAQ

Tools

Blogs