Tutorial

Prerequisites

Before using AI features, you need:

  • A valid authentication token (String). Obtain via POST /auth/login or use an accesskey from GET /accesskeys.

  • A source phone number in E.164 format (e.g., +15551234567). Obtain one owned by your account via GET /numbers.

  • A destination phone number in E.164 format or an internal extension.

  • An LLM provider API key (String). Obtain from your provider’s dashboard (e.g., OpenAI, Anthropic).

  • (Optional) A pre-created AI configuration (UUID). Create one via POST /ais or use inline action settings.

  • (Optional) A flow ID (UUID). Create one via POST /flows or obtain from GET /flows.

Note

AI Implementation Hint

AI features use three external services: an LLM (e.g., OpenAI), a TTS provider (e.g., ElevenLabs), and an STT provider (e.g., Deepgram). Each incurs costs on both VoIPBIN credits and the external provider’s billing. Verify your VoIPBIN balance via GET /billing-accounts and your provider API key validity before creating AI calls.

Simple AI Voice Assistant

Create a basic AI voice assistant that answers questions during a call. The AI will listen to the user’s speech, process it, and respond using text-to-speech.

$ curl --location --request POST 'https://api.voipbin.net/v1.0/calls?token=<YOUR_AUTH_TOKEN>' \
    --header 'Content-Type: application/json' \
    --data-raw '{
        "source": {
            "type": "tel",
            "target": "+15551234567"
        },
        "destinations": [
            {
                "type": "tel",
                "target": "+15559876543"
            }
        ],
        "actions": [
            {
                "type": "answer"
            },
            {
                "type": "ai",
                "option": {
                    "initial_prompt": "You are a helpful customer service assistant. Answer questions politely and concisely.",
                    "voice_type": "female"
                }
            }
        ]
    }'

Response:

{
    "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",   // Save this as call_id
    "status": "dialing",
    "source": {"type": "tel", "target": "+15551234567"},
    "destination": {"type": "tel", "target": "+15559876543"},
    "direction": "outgoing",
    "tm_create": "2026-02-18T10:30:00Z"
}

This creates a call with an AI assistant that will:

  1. Answer the incoming call

  2. Listen to the user’s speech using STT (Speech-to-Text)

  3. Process the input through the AI engine with the given prompt

  4. Respond using TTS (Text-to-Speech)

AI Talk with Real-Time Conversation

Use AI Talk for more natural, low-latency conversations powered by ElevenLabs. This enables interruption detection where the AI stops speaking when the user starts talking.

$ curl --location --request POST 'https://api.voipbin.net/v1.0/calls?token=<YOUR_AUTH_TOKEN>' \
    --header 'Content-Type: application/json' \
    --data-raw '{
        "source": {
            "type": "tel",
            "target": "+15551234567"
        },
        "destinations": [
            {
                "type": "tel",
                "target": "+15559876543"
            }
        ],
        "actions": [
            {
                "type": "answer"
            },
            {
                "type": "ai_talk",
                "option": {
                    "initial_prompt": "You are an expert sales representative for VoIPBIN. Help customers understand our calling and messaging platform. Be enthusiastic but professional.",
                    "voice_type": "male"
                }
            }
        ]
    }'

Response:

{
    "id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",   // Save this as call_id
    "status": "dialing",
    "source": {"type": "tel", "target": "+15551234567"},
    "destination": {"type": "tel", "target": "+15559876543"},
    "direction": "outgoing",
    "tm_create": "2026-02-18T10:31:00Z"
}

AI Talk provides:

  • Interruption Detection: Stops speaking when user talks

  • Low Latency: Streams responses in chunks for faster perceived response time

  • Natural Voice: Uses ElevenLabs for high-quality voice output

  • Context Retention: Remembers previous conversation exchanges

Note

AI Implementation Hint

The ai_talk action type (not ai) enables real-time voice interaction with interruption detection. Use ai_talk for live conversational AI. The older ai action type uses a simpler request-response pattern without interruption support and is recommended only for basic Q&A use cases.

AI with Custom Voice ID

Customize the AI voice by specifying an ElevenLabs Voice ID using variables.

$ curl --location --request POST 'https://api.voipbin.net/v1.0/calls?token=<YOUR_AUTH_TOKEN>' \
    --header 'Content-Type: application/json' \
    --data-raw '{
        "source": {
            "type": "tel",
            "target": "+15551234567"
        },
        "destinations": [
            {
                "type": "tel",
                "target": "+15559876543"
            }
        ],
        "actions": [
            {
                "type": "answer"
            },
            {
                "type": "variable_set",
                "option": {
                    "key": "voipbin.tts.elevenlabs.voice_id",
                    "value": "21m00Tcm4TlvDq8ikWAM"
                }
            },
            {
                "type": "ai_talk",
                "option": {
                    "initial_prompt": "You are a friendly receptionist. Greet callers warmly and help them with their inquiries."
                }
            }
        ]
    }'

See Built-in ElevenLabs Voice IDs for available voice options.

AI Summary for Call Transcription

Generate an AI-powered summary of a call transcription. This is useful for post-call analysis and record-keeping.

$ curl --location --request POST 'https://api.voipbin.net/v1.0/calls?token=<YOUR_AUTH_TOKEN>' \
    --header 'Content-Type: application/json' \
    --data-raw '{
        "source": {
            "type": "tel",
            "target": "+15551234567"
        },
        "destinations": [
            {
                "type": "tel",
                "target": "+15559876543"
            }
        ],
        "actions": [
            {
                "type": "answer"
            },
            {
                "type": "transcribe_start",
                "option": {
                    "language": "en-US"
                }
            },
            {
                "type": "talk",
                "option": {
                    "text": "Hello! This call is being transcribed and summarized. Please tell me about your experience with our service.",
                    "language": "en-US"
                }
            },
            {
                "type": "sleep",
                "option": {
                    "duration": 30000
                }
            },
            {
                "type": "ai_summary",
                "option": {
                    "source_type": "transcribe",
                    "source_id": "${voipbin.transcribe.id}"
                }
            },
            {
                "type": "talk",
                "option": {
                    "text": "Thank you for your feedback. We have recorded and summarized your call.",
                    "language": "en-US"
                }
            }
        ]
    }'

The AI summary will: - Process the transcription from transcribe_start - Generate a structured summary of key points - Store the summary in ${voipbin.ai_summary.content} - Can be accessed via webhook or API after the call

Real-Time AI Summary

Get AI summaries while the call is still active. Useful for live call monitoring and agent assistance.

$ curl --location --request POST 'https://api.voipbin.net/v1.0/calls?token=<YOUR_AUTH_TOKEN>' \
    --header 'Content-Type: application/json' \
    --data-raw '{
        "source": {
            "type": "tel",
            "target": "+15551234567"
        },
        "destinations": [
            {
                "type": "tel",
                "target": "+15559876543"
            }
        ],
        "actions": [
            {
                "type": "answer"
            },
            {
                "type": "transcribe_start",
                "option": {
                    "language": "en-US",
                    "real_time": true
                }
            },
            {
                "type": "ai_summary",
                "option": {
                    "source_type": "call",
                    "source_id": "${voipbin.call.id}",
                    "real_time": true
                }
            },
            {
                "type": "connect",
                "option": {
                    "source": {
                        "type": "tel",
                        "target": "+15551234567"
                    },
                    "destinations": [
                        {
                            "type": "tel",
                            "target": "+15551111111"
                        }
                    ]
                }
            }
        ]
    }'

Real-time summaries provide: - Live Updates: Summary updates as conversation progresses - Agent Assistance: Provides context to agents joining mid-call - Call Monitoring: Enables supervisors to quickly understand ongoing calls

Best Practices

Initial Prompt Design: - Be specific about the AI’s role and behavior - Include constraints (e.g., “Keep responses under 30 seconds”) - Define the tone (professional, friendly, technical, etc.)

Language Support: - AI supports multiple languages (see supported languages) - Set the stt_language field on the AI configuration (POST /ais or PUT /ais) to match the user’s expected language in BCP-47 format (e.g., ko-KR, en-US). See STT Language. - If stt_language is omitted, the STT provider uses auto-detection, which may reduce accuracy for non-English calls - For multilingual deployments, create separate AI configurations per language and reference the appropriate ai_id in each flow

Context Retention: - AI remembers conversation history within the same call - Variables set during the call are available to AI - Use context to build multi-turn conversations

Error Handling: - Always include fallback actions after AI actions - Handle cases where AI may not understand the input - Provide clear instructions to users about what they can ask

Regenerate direct AI hash

Regenerate the direct hash for an AI configuration. This invalidates the previous SIP URI and creates a new one. If the AI has no existing direct hash, one is created automatically.

$ curl -k --location --request POST 'https://api.voipbin.net/v1.0/ais/a092c5d9-632c-48d7-b70b-499f2ca084b1/direct-hash-regenerate?token=<YOUR_AUTH_TOKEN>'

{
    "id": "a092c5d9-632c-48d7-b70b-499f2ca084b1",
    "customer_id": "5e4a0680-804e-11ec-8477-2fea5968d85b",
    "name": "Sales Assistant AI",
    "detail": "AI assistant for handling sales inquiries",
    "engine_model": "openai.gpt-4o",
    "direct_hash": "c5d6e7f8a9b0",
    "tm_create": "2024-02-09 07:01:35.666687",
    "tm_update": "2024-02-09 07:05:12.123456",
    "tm_delete": "9999-01-01 00:00:00.000000"
}

Note

AI Implementation Hint

This endpoint requires no request body. The direct_hash in the response is the new hash — the previous hash is permanently invalidated. The direct SIP URI format is sip:direct.<hash>@sip.voipbin.net.

Troubleshooting

  • 400 Bad Request:
    • Cause: Invalid engine_model format or missing required action fields.

    • Fix: Verify engine_model uses <provider>.<model> format (e.g., openai.gpt-4o). Ensure initial_prompt is provided.

  • 402 Payment Required:
    • Cause: Insufficient VoIPBIN account balance.

    • Fix: Check balance via GET /billing-accounts. Top up before retrying.

  • AI not responding during call:
    • Cause: LLM provider API key is invalid or rate-limited.

    • Fix: Verify the engine_key in your AI configuration. Check the provider’s status page and rate limits.

  • No audio from AI:
    • Cause: TTS provider credentials are invalid or the voice ID does not exist.

    • Fix: Verify tts_type and tts_voice_id. Try using a default voice (omit tts_voice_id).

For more details on AI features and configuration, see AI Overview.