Speaking
Speaking
{
"id": "<string>",
"customer_id": "<string>",
"reference_type": "<string>",
"reference_id": "<string>",
"language": "<string>",
"provider": "<string>",
"voice_id": "<string>",
"direction": "<string>",
"status": "<string>",
"tm_create": "<string>",
"tm_update": "<string>",
"tm_delete": "<string>",
}
id(UUID): The speaking session’s unique identifier. Returned when creating a TTS session viaPOST /speakingsor listing viaGET /speakings.customer_id(UUID): The customer who owns this speaking session. Obtained fromGET /customer.reference_type(enum string): The type of resource receiving TTS audio. See Reference Type.reference_id(UUID): The ID of the resource receiving TTS audio. Depending onreference_type, obtained fromGET /callsorGET /conferences.language(String, BCP47): The language and locale for TTS synthesis (e.g.,en-US,ko-KR). Must match the provider’s supported languages.provider(enum string, optional): The TTS provider used for synthesis. See Provider. If omitted, defaults toelevenlabs.voice_id(String, optional): A provider-specific voice identifier. If omitted, the provider’s default voice for the specified language is used. Obtain available voices from the provider’s documentation.direction(enum string): The audio routing direction. See Direction.status(enum string): The speaking session’s current status. See Status.tm_create(string, ISO 8601): Timestamp when the speaking session was created.tm_update(string, ISO 8601): Timestamp of the last update to any speaking property.tm_delete(string, ISO 8601): Timestamp when the speaking session was deleted. Set to9999-01-01 00:00:00.000000if not deleted.
Note
AI Implementation Hint
Timestamps set to 9999-01-01 00:00:00.000000 indicate the event has not yet occurred. For example, tm_delete with this value means the speaking session has not been deleted.
Example
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"customer_id": "5e4a0680-804e-11ec-8477-2fea5968d85b",
"reference_type": "call",
"reference_id": "12f8f1c9-a6c3-4f81-93db-ae445dcf188f",
"language": "en-US",
"provider": "elevenlabs",
"voice_id": "",
"direction": "both",
"status": "active",
"tm_create": "2025-06-15 14:30:00.123456",
"tm_update": "2025-06-15 14:30:02.456789",
"tm_delete": "9999-01-01 00:00:00.000000"
}
reference_type
All possible values for the reference_type field:
Type |
Description |
|---|---|
call |
Attach TTS to a live call. The |
confbridge |
Attach TTS to a live conference. The |
provider
All possible values for the provider field:
Provider |
Description |
|---|---|
elevenlabs |
ElevenLabs TTS. High-quality neural voices. Default provider if omitted. |
gcp |
Google Cloud Text-to-Speech. Wide language support with WaveNet and Neural2 voices. |
aws |
Amazon Polly. Neural and standard voices with SSML support. |
When creating a speaking session, the provider field is optional. If omitted, VoIPBIN defaults to elevenlabs.
status
All possible values for the status field:
Status |
Description |
|---|---|
initiating |
TTS session is being set up. Provider connection is being established. Do not call |
active |
TTS session is ready. Send text via |
stopped |
TTS session has ended. Stopped via |
direction
All possible values for the direction field:
Direction |
Description |
|---|---|
in |
Audio injected toward the caller (remote party hears it, local party does not). |
out |
Audio injected toward the callee/local side (local party hears it, remote party does not). |
both |
Audio injected to both sides of the call. Both parties hear the synthesized speech. |