.. _mediastream-tutorial: Tutorial ======== Bi-Directional Media Streaming for Calls ----------------------------------------- Connect to a call's media stream via WebSocket to send and receive audio in real-time. This allows you to build custom audio processing applications without SIP signaling. **Establish WebSocket Connection:** .. code:: GET https://api.voipbin.net/v1.0/calls//media_stream?encapsulation=rtp&token= **Example:** .. code:: GET https://api.voipbin.net/v1.0/calls/652af662-eb45-11ee-b1a5-6fde165f9226/media_stream?encapsulation=rtp&token= This creates a bi-directional WebSocket connection where you can: - **Receive audio** from the call (what the other party is saying) - **Send audio** to the call (inject audio into the conversation) Bi-Directional Media Streaming for Conferences ----------------------------------------------- Access a conference's media stream to monitor or participate in the conference audio. .. code:: GET https://api.voipbin.net/v1.0/conferences//media_stream?encapsulation=rtp&token= **Example:** .. code:: GET https://api.voipbin.net/v1.0/conferences/1ed12456-eb4b-11ee-bba8-1bfb2838807a/media_stream?encapsulation=rtp&token= This allows you to: - Listen to all conference participants - Inject audio into the conference - Build custom conference recording or analysis tools Encapsulation Types ------------------- VoIPBIN supports three encapsulation types for media streaming: **1. RTP (Real-time Transport Protocol)** Standard protocol for audio/video over IP networks. .. code:: ?encapsulation=rtp **Use cases:** - Standard VoIP integration - Compatible with most audio processing tools - Industry-standard protocol **2. SLN (Signed Linear Mono)** Raw audio stream without headers or padding. .. code:: ?encapsulation=sln **Use cases:** - Minimal overhead needed - Simple audio processing - Direct PCM audio access **3. AudioSocket** Asterisk-specific protocol for simple audio streaming. .. code:: ?encapsulation=audiosocket **Use cases:** - Asterisk integration - Low-overhead streaming - Simple audio applications **Codec:** All formats use **16-bit, 8kHz, mono** audio (ulaw for RTP/SLN, PCM little-endian for AudioSocket) WebSocket Client Examples -------------------------- **Python Example (RTP Streaming):** .. code:: import websocket import struct def on_message(ws, message): """Receive audio data from the call""" # message contains RTP packets print(f"Received {len(message)} bytes of audio") # Process audio here # - Save to file # - Run speech recognition # - Analyze audio process_audio(message) def on_open(ws): """Connection established, can start sending audio""" print("Media stream connected") # Send audio to the call # audio_data should be RTP packets audio_data = generate_audio() ws.send(audio_data, opcode=websocket.ABNF.OPCODE_BINARY) def on_error(ws, error): print(f"Error: {error}") def on_close(ws, close_status_code, close_msg): print(f"Connection closed: {close_status_code}") # Connect to media stream call_id = "652af662-eb45-11ee-b1a5-6fde165f9226" token = "" ws_url = f"wss://api.voipbin.net/v1.0/calls/{call_id}/media_stream?encapsulation=rtp&token={token}" ws = websocket.WebSocketApp( ws_url, on_open=on_open, on_message=on_message, on_error=on_error, on_close=on_close ) ws.run_forever() def process_audio(rtp_packet): """Process received RTP audio""" # Extract payload from RTP packet # RTP header is typically 12 bytes payload = rtp_packet[12:] # Save or process audio with open('received_audio.raw', 'ab') as f: f.write(payload) def generate_audio(): """Generate RTP packets to send""" # This is a simplified example # In production, properly construct RTP packets # Read audio file with open('audio_to_inject.raw', 'rb') as f: audio_data = f.read(160) # 20ms of 8kHz audio # Construct RTP header (simplified) # In production, use a proper RTP library return audio_data **JavaScript Example (Browser):** .. code:: const callId = '652af662-eb45-11ee-b1a5-6fde165f9226'; const token = ''; const wsUrl = `wss://api.voipbin.net/v1.0/calls/${callId}/media_stream?encapsulation=rtp&token=${token}`; const ws = new WebSocket(wsUrl); ws.binaryType = 'arraybuffer'; ws.onopen = function() { console.log('Media stream connected'); // Send audio to the call const audioData = generateAudio(); ws.send(audioData); }; ws.onmessage = function(event) { // Receive audio from the call const audioData = event.data; console.log(`Received ${audioData.byteLength} bytes`); // Process audio processAudio(new Uint8Array(audioData)); }; ws.onerror = function(error) { console.error('WebSocket error:', error); }; ws.onclose = function() { console.log('Media stream closed'); }; function processAudio(audioBuffer) { // Process received audio // - Play through Web Audio API // - Run speech recognition // - Visualize audio } function generateAudio() { // Generate audio to send // Returns ArrayBuffer with RTP packets return new ArrayBuffer(172); // RTP packet size } **Node.js Example (AudioSocket):** .. code:: const WebSocket = require('ws'); const fs = require('fs'); const callId = '652af662-eb45-11ee-b1a5-6fde165f9226'; const token = ''; const wsUrl = `wss://api.voipbin.net/v1.0/calls/${callId}/media_stream?encapsulation=audiosocket&token=${token}`; const ws = new WebSocket(wsUrl); ws.on('open', function() { console.log('AudioSocket connected'); // Send audio file const audioFile = fs.readFileSync('audio.pcm'); // Send in chunks (20ms = 320 bytes for 16-bit 8kHz mono) const chunkSize = 320; for (let i = 0; i < audioFile.length; i += chunkSize) { const chunk = audioFile.slice(i, i + chunkSize); ws.send(chunk); } }); ws.on('message', function(data) { // Receive audio from call console.log(`Received ${data.length} bytes`); // Save received audio fs.appendFileSync('received_audio.pcm', data); }); ws.on('error', function(error) { console.error('Error:', error); }); ws.on('close', function() { console.log('AudioSocket closed'); }); Uni-Directional Streaming with Flow Action ------------------------------------------- For sending audio to a call without receiving audio back, use the ``external_media_start`` flow action. **Create Call with External Media:** .. code:: $ curl --location --request POST 'https://api.voipbin.net/v1.0/calls?token=' \ --header 'Content-Type: application/json' \ --data-raw '{ "source": { "type": "tel", "target": "+15551234567" }, "destinations": [ { "type": "tel", "target": "+15559876543" } ], "actions": [ { "type": "answer" }, { "type": "external_media_start", "option": { "url": "wss://your-media-server.com/audio-stream", "encapsulation": "audiosocket" } } ] }' This creates a uni-directional stream where VoIPBIN: 1. Establishes the call 2. Connects to your media server via WebSocket 3. Receives audio from your server 4. Plays that audio to the call participant **Your media server receives:** .. code:: WebSocket connection from VoIPBIN → Send audio chunks (PCM format for AudioSocket) → VoIPBIN plays audio to call Common Use Cases ---------------- **1. Real-Time Speech Recognition:** .. code:: # Python example def on_message(ws, message): # Extract audio from RTP packet audio = extract_audio(message) # Send to speech recognition API text = speech_to_text(audio) print(f"Recognized: {text}") # Store transcription save_transcription(text) **2. Audio Injection / IVR Replacement:** .. code:: # Node.js example ws.on('open', function() { // Play custom audio prompts const prompt1 = fs.readFileSync('welcome.pcm'); ws.send(prompt1); // Wait for DTMF or speech // Then play next prompt }); **3. Conference Recording:** .. code:: # Python example def on_message(ws, message): # Save all conference audio with open(f'conference_{conference_id}.raw', 'ab') as f: f.write(extract_audio(message)) **4. Real-Time Audio Analysis:** .. code:: def on_message(ws, message): audio = extract_audio(message) # Detect emotion emotion = analyze_emotion(audio) # Detect keywords if detect_keyword(audio, ['help', 'urgent']): alert_supervisor() # Calculate audio quality quality = measure_quality(audio) **5. Custom Music on Hold:** .. code:: ws.on('open', function() { // Play custom music or messages const music = fs.readFileSync('hold_music.pcm'); // Loop music while call is on hold setInterval(() => { ws.send(music); }, 1000); }); **6. AI-Powered Voice Assistant:** .. code:: ws.on('message', function(data) { // Receive customer audio const audio = extractAudio(data); // Send to AI for processing const response = await aiProcess(audio); // Convert AI response to audio const responseAudio = textToSpeech(response); // Send back to call ws.send(responseAudio); }); Audio Format Details -------------------- **RTP Format:** - Codec: ulaw (G.711 μ-law) - Sample rate: 8 kHz - Bits: 16-bit - Channels: Mono - Packet size: 160 bytes payload (20ms audio) **SLN Format:** - Raw PCM audio - No headers or padding - Sample rate: 8 kHz - Bits: 16-bit signed - Channels: Mono **AudioSocket Format:** - PCM little-endian - Sample rate: 8 kHz - Bits: 16-bit - Channels: Mono - Chunk size: 320 bytes (20ms of audio) Best Practices -------------- **1. Buffer Management:** - Maintain audio buffers to handle jitter - Send audio in consistent 20ms chunks - Don't send too fast or too slow **2. Error Handling:** - Implement reconnection logic - Handle WebSocket disconnections gracefully - Log errors for debugging **3. Audio Quality:** - Use proper RTP packet construction - Maintain correct timing for audio chunks - Monitor for packet loss **4. Resource Management:** - Close WebSocket when done - Don't leave connections open indefinitely - Clean up audio buffers and files **5. Testing:** - Test with various network conditions - Verify audio quality with real calls - Monitor latency and packet loss **6. Security:** - Use WSS (secure WebSocket) in production - Validate authentication tokens - Encrypt sensitive audio data Connection Lifecycle -------------------- **1. Establish Connection:** .. code:: GET /v1.0/calls//media_stream?encapsulation=rtp&token= **2. WebSocket Upgrade:** .. code:: HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade **3. Bi-Directional Communication:** .. code:: Client ←→ VoIPBIN - Send audio: Binary frames with RTP packets - Receive audio: Binary frames with RTP packets **4. Close Connection:** .. code:: ws.close() Troubleshooting --------------- **Common Issues:** **No audio received:** - Check WebSocket connection is established - Verify call is active and answered - Ensure correct encapsulation type **Audio quality poor:** - Check network latency - Verify audio format matches requirements - Monitor packet loss **Connection drops:** - Implement reconnection logic - Check firewall rules for WebSocket - Verify authentication token is valid **Can't send audio:** - Ensure binary frames are used (not text) - Verify audio format is correct - Check audio chunk size (typically 20ms) For more information about media stream configuration, see :ref:`Media Stream Overview `.