Chatbot(AI)

Overview

VoIPBin’s chatbot (AI) is a built-in AI agent that enables automated, intelligent voice interactions during live calls. Designed for seamless integration within VoIPBin’s flow, the chatbot utilizes ChatGPT as its AI engine to process and respond to user inputs in real time. This allows developers to create dynamic and interactive voice experiences without requiring manual intervention.

How it works

Action component

The chatbot is integrated as one of the configurable components within a VoIPBin flow. When a call reaches a chatbot action, the system triggers the AI to generate a response based on the provided prompt. The response is then processed and played back to the caller using text-to-speech (TTS). If the response is in a structured JSON format, VoIPBin executes the defined actions accordingly.

TTS/STT + AI Engine

VoIPBin’s chatbot is built using TTS/STT + AI Engine, where speech-to-text (STT) converts spoken words into text, and text-to-speech (TTS) converts responses back into audio. The system processes these in real time, enabling seamless conversations.

Voice Detection and Play Interruption:

In addition to basic TTS and STT functionalities, VoIPBin incorporates voice detection to create a more natural conversational flow. While the chatbot is speaking (i.e., playing TTS media), if the system detects the caller’s voice, it immediately stops the TTS playback and routes the caller’s speech (via STT) to the AI engine. This play interruption feature ensures that if the user starts talking, their input is prioritized, enabling a dynamic interaction that more closely resembles a real conversation.

External AI Agent Integration

For users who prefer to use external AI services, such as VAPI or other AI agent service providers, VoIPBin offers media stream access. This allows third-party AI engines to process voice data directly, enabling deeper customization and advanced AI capabilities.

Multiple Chatbot Actions in a Flow

VoIPBin allows multiple chatbot actions within a single flow. Developers can configure different chatbot interactions at various points, enabling flexible and context-aware automation.

Handling Responses

Text String Response: The chatbot’s response is played as speech using TTS.
JSON Response: The chatbot returns a structured JSON array of action objects, which VoIPBin executes accordingly.
Error Handling: If the chatbot generates an invalid JSON response, VoIPBin treats it as a normal text response and plays it via TTS.

Using the Chatbot

Initial Prompt

The initial prompt serves as the foundation for the chatbot’s behavior. A well-crafted prompt ensures accurate and relevant responses. There is no limit to prompt length, but this should remain confidential for future considerations.

Example Prompt:

Pretend you are an expert customer service agent.

Please respond kindly.

But, if you receive a request to connect to the agent, respond with the next message in JSON format.
Do not include any explanations in the response.
Only provide an RFC8259-compliant JSON response following this format without deviation.

[
    {
        "action": "connect",
        "option": {
            "source": {
                "type": "tel",
                "target": "+821100000001"
            },
            "destinations": [
                {
                    "type": "tel",
                    "target": "+821100000002"
                }
            ]
        }
    }
]

Action Object Structure

See detail here.

VoIPBin supports a wide range of actions. Developers should refer to VoIPBin’s documentation for a complete list of available actions.

Technical Considerations

Escalation to Live Agents

VoIPBin does not provide an automatic escalation mechanism for transferring calls to human agents. Instead, developers must configure chatbot responses accordingly by ensuring that chatbot logic returns a JSON action when escalation is required.

Logging & Debugging

Developers can debug chatbot interactions through VoIPBin’s transcription logs, which capture chatbot responses and interactions.

Current Limitations & Future Enhancements

TTS Customization: Currently, voice, language, and speed customization are not available but will be added in future updates.
Multilingual Support: The chatbot currently supports only English, but additional language support is planned.
Context Retention: Each chatbot request is processed independently, meaning there is no built-in conversation memory.

VoIPBin’s chatbot feature offers a flexible and intelligent way to automate voice interactions within flows. By leveraging AI-powered responses and structured action execution, developers can enhance call experiences with minimal effort. As VoIPBin continues to evolve, future updates will introduce greater customization options and multilingual capabilities.

Chatbot

{
    "id": "<string>",
    "customer_id": "<string>",
    "name": "<string>",
    "detail": "<string>",
    "engine_type": "<string>",
    "init_prompt": "<string>",
    "tm_create": "<string>",
    "tm_update": "<string>",
    "tm_delete": "<string>"
}

id: Chatbot’s ID.
customer_id: Customer’s ID.
name: Chatbot’s name.
detail: Chatbot’s detail.
engine_type: Chatbot’s engine type. See detail here
init_prompt: Defines chatbot’s initial prompt. It will define the chatbot engine’s behavior.

Example

{
    "id": "a092c5d9-632c-48d7-b70b-499f2ca084b1",
    "customer_id": "5e4a0680-804e-11ec-8477-2fea5968d85b",
    "name": "test chatbot",
    "detail": "test chatbot for simple scenario",
    "engine_type": "chatGPT",
    "tm_create": "2023-02-09 07:01:35.666687",
    "tm_update": "9999-01-01 00:00:00.000000",
    "tm_delete": "9999-01-01 00:00:00.000000"
}

Type

Chat’s type.

Type	Description
chatGPT	Openai’s Chat AI. https://chat.openai.com/chat
clova	Naver’s Clova AI(WIP). https://clova.ai/