Transcribe
Transcribe
{
"id": "<string>",
"customer_id": "<string>",
"reference_type": "<string>",
"reference_id": "<string>",
"status": "<string>",
"language": "<string>",
"provider": "<string>",
"tm_create": "<string>",
"tm_update": "<string>",
"tm_delete": "<string>",
}
id(UUID): The transcribe session’s unique identifier. Returned when creating a transcription viaPOST /transcribesor listing viaGET /transcribes.customer_id(UUID): The customer who owns this transcription. Obtained fromGET /customers.reference_type(enum string): The type of resource being transcribed. See Reference Type.reference_id(UUID): The ID of the resource being transcribed. Depending onreference_type, obtained fromGET /calls,GET /recordings, orGET /conferences.status(enum string): The transcription session’s current status. See Status.language(String, BCP47): The language code for transcription (e.g.,en-US,ko-KR,ja-JP). See Supported Languages.provider(enum string, optional): The STT provider used for this transcription. See Provider.tm_create(string, ISO 8601): Timestamp when the transcribe session was created.tm_update(string, ISO 8601): Timestamp of the last update to any transcribe property.tm_delete(string, ISO 8601): Timestamp when the transcribe session was deleted. Set to9999-01-01 00:00:00.000000if not deleted.
Note
AI Implementation Hint
Timestamps set to 9999-01-01 00:00:00.000000 indicate the event has not yet occurred. For example, tm_delete with this value means the transcription has not been deleted.
Example
{
"id": "bbf08426-3979-41bc-a544-5fc92c237848",
"customer_id": "5e4a0680-804e-11ec-8477-2fea5968d85b",
"reference_type": "call",
"reference_id": "12f8f1c9-a6c3-4f81-93db-ae445dcf188f",
"status": "done",
"language": "en-US",
"provider": "gcp",
"tm_create": "2024-04-01 07:17:04.091019",
"tm_update": "2024-04-01 13:25:32.428602",
"tm_delete": "9999-01-01 00:00:00.000000"
}
reference_type
All possible values for the reference_type field:
Type |
Description |
|---|---|
call |
Transcribing a live call in real-time. The |
recording |
Transcribing a previously recorded audio file. The |
confbridge |
Transcribing a live conference. The |
provider
All possible values for the provider field:
Provider |
Description |
|---|---|
gcp |
Google Cloud Speech-to-Text |
aws |
Amazon Transcribe |
When creating a transcription, the provider field is optional. If omitted, VoIPBIN selects the best available provider automatically (default order: GCP, then AWS). If a specific provider is requested but unavailable, the system falls back to the default order.
status
All possible values for the status field:
Status |
Description |
|---|---|
progressing |
Transcription is actively in progress. New transcript segments are being generated and delivered via webhook or WebSocket. |
done |
Transcription is complete. No more transcript segments will be generated. All transcripts are available via |
Transcription
Transcription
{
"id": "<string>",
"transcribe_id": "<string>",
"direction": "<string>",
"message": "<string>",
"tm_transcript": "<string>",
"tm_create": "<string>",
},
id(UUID): The individual transcript segment’s unique identifier.transcribe_id(UUID): The parent transcribe session’s ID. Obtained fromGET /transcribesor the response ofPOST /transcribes.direction(enum string): Whether the speech was incoming or outgoing. See Direction.message(String): The transcribed text content of this speech segment.tm_transcript(String): Time offset within the call when this speech occurred. Uses0001-01-01 00:00:00as epoch; the time portion represents the offset from the start of the transcription session (e.g.,0001-01-01 00:01:04.441160means 1 minute and 4 seconds into the call). Sort by this field to reconstruct conversation order.tm_create(string, ISO 8601): Absolute timestamp when this transcript segment was created.
Note
AI Implementation Hint
The tm_transcript field is a time offset, not an absolute timestamp. Its date part (0001-01-01) is a sentinel value meaning “relative to the start of the transcription session.” To reconstruct a conversation in order, sort all transcript segments by tm_transcript, not by tm_create (which reflects delivery time, not speech time).
Example
{
"id": "06af78f0-b063-48c0-b22d-d31a5af0aa88",
"transcribe_id": "bbf08426-3979-41bc-a544-5fc92c237848",
"direction": "in",
"message": "Hi, good to see you. How are you today.",
"tm_transcript": "0001-01-01 00:05:04.441160",
"tm_create": "2024-04-01 07:22:07.229309"
}
direction
All possible values for the direction field:
Direction |
Description |
|---|---|
in |
Incoming speech toward VoIPBIN (i.e., what the caller/remote party said). |
out |
Outgoing speech from VoIPBIN (i.e., TTS audio, recorded prompts, or the connected party’s speech sent from VoIPBIN). |
Speech Webhook Message
Speech Webhook Message
The speech webhook message is the payload delivered for transcribe_speech_started, transcribe_speech_interim, and transcribe_speech_ended events. These events are generated during a real-time streaming transcription session when voice activity is detected.
{
"id": "<string>",
"customer_id": "<string>",
"streaming_id": "<string>",
"transcribe_id": "<string>",
"direction": "<string>",
"message": "<string>",
"tm_event": "<string>",
"tm_create": "<string>"
}
id(UUID): The unique identifier of the speech event.customer_id(UUID): The customer who owns this transcription session. Obtained fromGET /customers.streaming_id(UUID): The unique identifier of the audio streaming session that produced this event.transcribe_id(UUID): The parent transcribe session’s ID. Obtained fromGET /transcribesor the response ofPOST /transcribes.direction(enum string): Whether the speech was incoming or outgoing. See Direction.message(String): The interim transcribed text. Present fortranscribe_speech_interimevents. Empty fortranscribe_speech_startedandtranscribe_speech_endedevents.tm_event(string, ISO 8601): Timestamp when the speech event occurred.tm_create(string, ISO 8601): Timestamp when the speech event record was created.
Example
transcribe_speech_started:
{
"type": "transcribe_speech_started",
"data": {
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"customer_id": "5e4a0680-804e-11ec-8477-2fea5968d85b",
"streaming_id": "c0d1e2f3-a4b5-6c7d-8e9f-0a1b2c3d4e5f",
"transcribe_id": "bbf08426-3979-41bc-a544-5fc92c237848",
"direction": "in",
"tm_event": "2024-04-01 07:22:07.229309",
"tm_create": "2024-04-01 07:22:07.229309"
}
}
transcribe_speech_interim:
{
"type": "transcribe_speech_interim",
"data": {
"id": "b2c3d4e5-f6a7-8901-bcde-f23456789012",
"customer_id": "5e4a0680-804e-11ec-8477-2fea5968d85b",
"streaming_id": "c0d1e2f3-a4b5-6c7d-8e9f-0a1b2c3d4e5f",
"transcribe_id": "bbf08426-3979-41bc-a544-5fc92c237848",
"direction": "in",
"message": "Hello, I need help with my account",
"tm_event": "2024-04-01 07:22:08.115000",
"tm_create": "2024-04-01 07:22:08.115000"
}
}
transcribe_speech_ended:
{
"type": "transcribe_speech_ended",
"data": {
"id": "c3d4e5f6-a7b8-9012-cdef-345678901234",
"customer_id": "5e4a0680-804e-11ec-8477-2fea5968d85b",
"streaming_id": "c0d1e2f3-a4b5-6c7d-8e9f-0a1b2c3d4e5f",
"transcribe_id": "bbf08426-3979-41bc-a544-5fc92c237848",
"direction": "in",
"tm_event": "2024-04-01 07:22:12.500000",
"tm_create": "2024-04-01 07:22:12.500000"
}
}