AI

AI

{
    "id": "<string>",
    "customer_id": "<string>",
    "name": "<string>",
    "detail": "<string>",
    "engine_model": "<string>",
    "parameter": "<object>",
    "engine_key": "<string>",
    "rag_id": "<string>",
    "init_prompt": "<string>",
    "tts_type": "<string>",
    "tts_voice_id": "<string>",
    "stt_type": "<string>",
    "vad_config": {
        "confidence": <number>,
        "start_secs": <number>,
        "stop_secs": <number>,
        "min_volume": <number>
    },
    "smart_turn_enabled": <boolean>,
    "tool_names": ["<string>"],
    "tm_create": "<string>",
    "tm_update": "<string>",
    "tm_delete": "<string>"
}
  • id (UUID): The AI configuration’s unique identifier. Returned when creating an AI via POST /ais or when listing AIs via GET /ais.

  • customer_id (UUID): The customer that owns this AI configuration. Obtained from the id field of GET /customers.

  • name (String, Required): A human-readable name for the AI configuration (e.g., "Sales Assistant").

  • detail (String, Optional): A description of the AI’s purpose or additional notes.

  • engine_model (String, Required): The LLM provider and model. Format: <provider>.<model> (e.g., openai.gpt-4o, anthropic.claude-3-5-sonnet). See Engine Models.

  • parameter (Object, Optional): Custom key-value parameter data for the AI configuration. Supports flow variable substitution at runtime. Typically left as {}.

  • engine_key (String, Required): The API key for the LLM provider. Must be a valid key from the provider’s dashboard.

  • rag_id (UUID, Optional): The knowledge base ID for the search_knowledge tool. Obtained from the id field of GET https://api.voipbin.net/v1.0/rags. When set, the AI assistant can search this knowledge base during voice calls. Set to 00000000-0000-0000-0000-000000000000 or omit to disable.

  • init_prompt (String, Required): The system prompt that defines the AI’s behavior, persona, and instructions. No enforced length limit.

  • tts_type (enum string, Required): Text-to-Speech provider. See TTS Types.

  • tts_voice_id (String, Optional): Voice ID for the selected TTS provider. If omitted, the default voice for the chosen TTS type is used. See default voices in TTS Types.

  • stt_type (enum string, Required): Speech-to-Text provider. See STT Types.

  • vad_config (Object, Optional): Voice Activity Detection configuration. All fields are optional — omitted fields use Pipecat defaults. See VAD Config.

  • smart_turn_enabled (Boolean, Optional): Enable smart turn detection using Pipecat’s LocalSmartTurnAnalyzerV3 for more natural turn-taking. When true, the VAD stop_secs parameter is automatically forced to 0.2 regardless of vad_config settings. Defaults to false. See Smart Turn.

  • tool_names (Array of String, Optional): List of enabled tool functions. Use ["all"] to enable all tools, [] to disable all tools, or list specific tool names. See Tool Functions.

  • tm_create (String, ISO 8601): Timestamp when the AI configuration was created.

  • tm_update (String, ISO 8601): Timestamp when the AI configuration was last updated.

  • tm_delete (String, ISO 8601): Timestamp when the AI configuration was deleted, if applicable.

Note

AI Implementation Hint

The engine_key field contains the LLM provider’s API key. This key is write-only: it is accepted on POST /ais and PUT /ais but is never returned in GET responses for security. If you need to change the key, send a full PUT update with the new key.

Note

AI Implementation Hint

A tm_delete value of 9999-01-01 00:00:00.000000 indicates the AI configuration has not been deleted and is still active. This sentinel value is used across all VoIPBIN resources to represent “not yet occurred.”

Example

{
    "id": "a092c5d9-632c-48d7-b70b-499f2ca084b1",
    "customer_id": "5e4a0680-804e-11ec-8477-2fea5968d85b",
    "name": "Sales Assistant AI",
    "detail": "AI assistant for handling sales inquiries",
    "engine_model": "openai.gpt-4o",
    "parameter": {},
    "engine_key": "sk-...",
    "rag_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "init_prompt": "You are a friendly sales assistant. Help customers find the right products.",
    "tts_type": "elevenlabs",
    "tts_voice_id": "EXAVITQu4vr4xnSDxMaL",
    "stt_type": "deepgram",
    "vad_config": {
        "stop_secs": 0.5
    },
    "smart_turn_enabled": true,
    "tool_names": ["connect_call", "send_email", "stop_service"],
    "tm_create": "2024-02-09 07:01:35.666687",
    "tm_update": "9999-01-01 00:00:00.000000",
    "tm_delete": "9999-01-01 00:00:00.000000"
}

Engine Model

The engine_model field specifies which LLM provider and model to use. Format: <provider>.<model>.

Supported Providers

Provider

Format

Examples

OpenAI

openai.<model>

openai.gpt-4o, openai.gpt-4o-mini

Anthropic

anthropic.<model>

anthropic.claude-3-5-sonnet

AWS Bedrock

aws.<model>

aws.claude-3-sonnet

Azure OpenAI

azure.<model>

azure.gpt-4

Cerebras

cerebras.<model>

cerebras.llama3.1-8b

DeepSeek

deepseek.<model>

deepseek.deepseek-chat

Fireworks

fireworks.<model>

fireworks.llama-v3-70b

Google Gemini

gemini.<model>

gemini.gemini-1.5-pro

Grok

grok.<model>

grok.grok-1

Groq

groq.<model>

groq.llama3-70b-8192

Mistral

mistral.<model>

mistral.mistral-large

NVIDIA NIM

nvidia.<model>

nvidia.llama3-70b

Ollama

ollama.<model>

ollama.llama3

OpenRouter

openrouter.<model>

openrouter.meta-llama/llama-3-70b

Perplexity

perplexity.<model>

perplexity.llama-3-sonar-large

Qwen

qwen.<model>

qwen.qwen-max

SambaNova

sambanova.<model>

sambanova.llama3-70b

Together AI

together.<model>

together.meta-llama/Llama-3-70b

Dialogflow

dialogflow.<type>

dialogflow.cx, dialogflow.es

Common OpenAI Models

Model

Description

gpt-4o

Latest GPT-4 Omni model (recommended)

gpt-4o-mini

Smaller, faster GPT-4 Omni variant

gpt-4-turbo

GPT-4 Turbo with vision capabilities

gpt-4

Original GPT-4 model

gpt-3.5-turbo

Fast and cost-effective model

o1

OpenAI o1 reasoning model

o1-mini

Smaller o1 reasoning model

o3-mini

Latest o3 mini reasoning model

TTS Type

Text-to-Speech provider for converting AI responses to audio.

Type

Description

elevenlabs

ElevenLabs high-quality voice synthesis (recommended)

deepgram

Deepgram Aura voices

openai

OpenAI TTS (alloy, echo, fable, etc.)

aws

AWS Polly voices

azure

Azure Cognitive Services TTS

google

Google Cloud Text-to-Speech

cartesia

Cartesia TTS

hume

Hume AI emotional TTS

playht

PlayHT voice synthesis

Default Voice IDs by TTS Type

TTS Type

Default Voice ID

elevenlabs

EXAVITQu4vr4xnSDxMaL (Rachel)

deepgram

aura-2-thalia-en (Thalia)

openai

alloy

aws

Joanna

azure

en-US-JennyNeural

google

en-US-Wavenet-D

cartesia

71a7ad14-091c-4e8e-a314-022ece01c121

STT Type

Speech-to-Text provider for converting incoming audio to text.

Type

Description

deepgram

Deepgram speech recognition (recommended)

cartesia

Cartesia speech recognition

elevenlabs

ElevenLabs speech recognition

VAD Config

Voice Activity Detection configuration for tuning speech detection sensitivity and timing.

All fields are optional. Omitted fields use Pipecat’s native defaults.

Field

Default

Min

Max

Description

confidence

0.7

0.0

1.0

Minimum confidence threshold to detect voice.

start_secs

0.2

0.0

30.0

Duration in seconds of continuous speech needed to confirm speaking started.

stop_secs

0.2

0.0

30.0

Duration in seconds of silence needed to confirm speaking stopped.

min_volume

0.6

0.0

1.0

Minimum audio volume for voice detection.

Note

AI Implementation Hint

When vad_config is null or omitted, Pipecat’s native defaults apply (confidence=0.7, start_secs=0.2, stop_secs=0.2, min_volume=0.6). To keep the AI responsive but avoid cutting off speech mid-sentence, increase stop_secs (e.g., 0.5). To make the AI more patient before responding, increase both stop_secs and start_secs.

Smart Turn

When smart_turn_enabled is true, the Pipecat pipeline uses LocalSmartTurnAnalyzerV3 — a local ONNX model that analyzes speech and transcription context to detect when the user has truly finished their turn, rather than pausing mid-sentence. This results in more natural conversations with fewer premature interruptions.

Note

AI Implementation Hint

Smart Turn detection requires VAD stop_secs=0.2. When smart_turn_enabled is true, any stop_secs value in vad_config is silently overridden to 0.2. This value matches the model’s training data and allows Smart Turn to dynamically adjust timing.

Tool Names

The tool_names field controls which tool functions the AI can invoke during conversations.

Configuration Options

Value

Description

["all"]

Enable all available tool functions

[] or null

Disable all tool functions (AI can only converse)

["tool1", "tool2"]

Enable only the specified tools

Available Tools

See Tool Functions for the complete list of tools and their descriptions.

Example configurations:

// Enable all tools
"tool_names": ["all"]

// Enable only call transfer and email
"tool_names": ["connect_call", "send_email"]

// Enable conversation control tools only
"tool_names": ["stop_service", "stop_flow", "set_variables"]

// Disable all tools (conversation-only AI)
"tool_names": []