Mediastream
Overview
In VoipBin, the media stream feature empowers users to directly control media transmission without relying on SIP (Session Initiation Protocol) signaling. Traditionally, SIP signaling is used to establish, modify, and terminate communication sessions in VoIP systems. However, the media stream feature in VoipBin introduces an alternative method for managing media streams independently of SIP signaling.
With the media stream feature, users can initiate, manipulate, and terminate media streams directly through the VoipBin platform, bypassing the need for SIP signaling. This capability offers several advantages:
Flexibility: Users have greater flexibility in controlling media streams, as they can manage them independently of SIP signaling. This flexibility allows for more dynamic and customizable communication experiences.
Efficiency: By eliminating the dependency on SIP signaling for media control, the media stream feature can streamline the process of initiating and managing media streams. This can lead to more efficient use of resources and reduced latency in media transmission.
Scalability: The media stream feature can enhance the scalability of VoipBin by reducing the overhead associated with SIP signaling for media control. This can support a larger number of concurrent media streams and accommodate higher traffic volumes.
Enhanced User Experience: By enabling direct control over media streams, VoipBin can offer users a more seamless and responsive communication experience. Users can adjust media settings in real-time without the constraints imposed by SIP signaling.
Overall, the media stream feature in VoipBin empowers users with greater control and flexibility in managing media transmission, enhancing the efficiency, scalability, and user experience of the platform. This capability enables a wide range of applications, from real-time communication to multimedia streaming, with minimal reliance on traditional SIP signaling mechanisms.
Available resources
Currently, support the below types of resources. * Call: See detail here <call-overview>. * Conference: See detail here <conference-overview>.
Bi-directional streaming
Bi-directional streaming allows for simultaneous transmission and reception of media. To establish bi-directional streaming, an additional API call is necessary:
Media stream for call
GET https://api.voipbin.net/v1.0/calls/<call-id>/media_stream?encapsulation=<encapsulation-type>&token=<token>
https://api.voipbin.net/v1.0/calls/652af662-eb45-11ee-b1a5-6fde165f9226/media_stream?encapsulation=rtp&token=some_token
Media stream for conference
GET https://api.voipbin.net/v1.0/conferences/<conference-id>/media_stream?encapsulation=<encapsulation-type>&token=<token>
https://api.voipbin.net/v1.0/conferences/1ed12456-eb4b-11ee-bba8-1bfb2838807a/media_stream?encapsulation=rtp&token=some_token
By making this API call, a WebSocket connection is created, facilitating both the reception and transmission of media data. This means that both the server (VoipBin) and the client can send and receive media through the WebSocket connection.
+-----------------+ Websocket +-----------------+
| |--------------------------| |
| voipbin |<---- Media In/Out ---->| Client |
| |--------------------------| |
+-----------------+ +-----------------+
Uni-directional streaming
Uni-directional streaming involves the transmission of media in one direction only. This can be achieved through the use of a flow action known as “external media start”. This flow action initiates the transmission of media from the server to the client, without the need for the client to send media back. See detail here.
Encapsulation Types
There are two encapsulation types supported for media streaming:
rtp: RTP (Real-time Transport Protocol). This is the standard protocol for transmitting audio and video over IP networks. It provides a means for transporting media streams (e.g., voice) from one endpoint to another.
sln: Signed Linear Mono. This media stream format does not include headers or padding. ulaw allowed.
audiosocket: AudioSocket. This is a protocol specific to Asterisk, known as Asterisk’s AudioSocket type. It is designed to facilitate simple audio streaming with minimal overhead. More details about AudioSocket can be found in the Asterisk AudioSocket Documentation(https://docs.asterisk.org/Configuration/Channel-Drivers/AudioSocket/).
Codec
VoipBin currently supports the ulaw codec. always 16-bit, 8kHz signed linear mono.
For AudioSocket, it uses 16-bit, 8kHz, mono PCM (little-endian).
Tutorial
Bi-Directional Media Streaming for Calls
Connect to a call’s media stream via WebSocket to send and receive audio in real-time. This allows you to build custom audio processing applications without SIP signaling.
Establish WebSocket Connection:
GET https://api.voipbin.net/v1.0/calls/<call-id>/media_stream?encapsulation=rtp&token=<YOUR_AUTH_TOKEN>
Example:
GET https://api.voipbin.net/v1.0/calls/652af662-eb45-11ee-b1a5-6fde165f9226/media_stream?encapsulation=rtp&token=<YOUR_AUTH_TOKEN>
This creates a bi-directional WebSocket connection where you can: - Receive audio from the call (what the other party is saying) - Send audio to the call (inject audio into the conversation)
Bi-Directional Media Streaming for Conferences
Access a conference’s media stream to monitor or participate in the conference audio.
GET https://api.voipbin.net/v1.0/conferences/<conference-id>/media_stream?encapsulation=rtp&token=<YOUR_AUTH_TOKEN>
Example:
GET https://api.voipbin.net/v1.0/conferences/1ed12456-eb4b-11ee-bba8-1bfb2838807a/media_stream?encapsulation=rtp&token=<YOUR_AUTH_TOKEN>
This allows you to: - Listen to all conference participants - Inject audio into the conference - Build custom conference recording or analysis tools
Encapsulation Types
VoIPBIN supports three encapsulation types for media streaming:
1. RTP (Real-time Transport Protocol)
Standard protocol for audio/video over IP networks.
?encapsulation=rtp
Use cases: - Standard VoIP integration - Compatible with most audio processing tools - Industry-standard protocol
2. SLN (Signed Linear Mono)
Raw audio stream without headers or padding.
?encapsulation=sln
Use cases: - Minimal overhead needed - Simple audio processing - Direct PCM audio access
3. AudioSocket
Asterisk-specific protocol for simple audio streaming.
?encapsulation=audiosocket
Use cases: - Asterisk integration - Low-overhead streaming - Simple audio applications
Codec: All formats use 16-bit, 8kHz, mono audio (ulaw for RTP/SLN, PCM little-endian for AudioSocket)
WebSocket Client Examples
Python Example (RTP Streaming):
import websocket
import struct
def on_message(ws, message):
"""Receive audio data from the call"""
# message contains RTP packets
print(f"Received {len(message)} bytes of audio")
# Process audio here
# - Save to file
# - Run speech recognition
# - Analyze audio
process_audio(message)
def on_open(ws):
"""Connection established, can start sending audio"""
print("Media stream connected")
# Send audio to the call
# audio_data should be RTP packets
audio_data = generate_audio()
ws.send(audio_data, opcode=websocket.ABNF.OPCODE_BINARY)
def on_error(ws, error):
print(f"Error: {error}")
def on_close(ws, close_status_code, close_msg):
print(f"Connection closed: {close_status_code}")
# Connect to media stream
call_id = "652af662-eb45-11ee-b1a5-6fde165f9226"
token = "<YOUR_AUTH_TOKEN>"
ws_url = f"wss://api.voipbin.net/v1.0/calls/{call_id}/media_stream?encapsulation=rtp&token={token}"
ws = websocket.WebSocketApp(
ws_url,
on_open=on_open,
on_message=on_message,
on_error=on_error,
on_close=on_close
)
ws.run_forever()
def process_audio(rtp_packet):
"""Process received RTP audio"""
# Extract payload from RTP packet
# RTP header is typically 12 bytes
payload = rtp_packet[12:]
# Save or process audio
with open('received_audio.raw', 'ab') as f:
f.write(payload)
def generate_audio():
"""Generate RTP packets to send"""
# This is a simplified example
# In production, properly construct RTP packets
# Read audio file
with open('audio_to_inject.raw', 'rb') as f:
audio_data = f.read(160) # 20ms of 8kHz audio
# Construct RTP header (simplified)
# In production, use a proper RTP library
return audio_data
JavaScript Example (Browser):
const callId = '652af662-eb45-11ee-b1a5-6fde165f9226';
const token = '<YOUR_AUTH_TOKEN>';
const wsUrl = `wss://api.voipbin.net/v1.0/calls/${callId}/media_stream?encapsulation=rtp&token=${token}`;
const ws = new WebSocket(wsUrl);
ws.binaryType = 'arraybuffer';
ws.onopen = function() {
console.log('Media stream connected');
// Send audio to the call
const audioData = generateAudio();
ws.send(audioData);
};
ws.onmessage = function(event) {
// Receive audio from the call
const audioData = event.data;
console.log(`Received ${audioData.byteLength} bytes`);
// Process audio
processAudio(new Uint8Array(audioData));
};
ws.onerror = function(error) {
console.error('WebSocket error:', error);
};
ws.onclose = function() {
console.log('Media stream closed');
};
function processAudio(audioBuffer) {
// Process received audio
// - Play through Web Audio API
// - Run speech recognition
// - Visualize audio
}
function generateAudio() {
// Generate audio to send
// Returns ArrayBuffer with RTP packets
return new ArrayBuffer(172); // RTP packet size
}
Node.js Example (AudioSocket):
const WebSocket = require('ws');
const fs = require('fs');
const callId = '652af662-eb45-11ee-b1a5-6fde165f9226';
const token = '<YOUR_AUTH_TOKEN>';
const wsUrl = `wss://api.voipbin.net/v1.0/calls/${callId}/media_stream?encapsulation=audiosocket&token=${token}`;
const ws = new WebSocket(wsUrl);
ws.on('open', function() {
console.log('AudioSocket connected');
// Send audio file
const audioFile = fs.readFileSync('audio.pcm');
// Send in chunks (20ms = 320 bytes for 16-bit 8kHz mono)
const chunkSize = 320;
for (let i = 0; i < audioFile.length; i += chunkSize) {
const chunk = audioFile.slice(i, i + chunkSize);
ws.send(chunk);
}
});
ws.on('message', function(data) {
// Receive audio from call
console.log(`Received ${data.length} bytes`);
// Save received audio
fs.appendFileSync('received_audio.pcm', data);
});
ws.on('error', function(error) {
console.error('Error:', error);
});
ws.on('close', function() {
console.log('AudioSocket closed');
});
Uni-Directional Streaming with Flow Action
For sending audio to a call without receiving audio back, use the external_media_start flow action.
Create Call with External Media:
$ curl --location --request POST 'https://api.voipbin.net/v1.0/calls?token=<YOUR_AUTH_TOKEN>' \
--header 'Content-Type: application/json' \
--data-raw '{
"source": {
"type": "tel",
"target": "+15551234567"
},
"destinations": [
{
"type": "tel",
"target": "+15559876543"
}
],
"actions": [
{
"type": "answer"
},
{
"type": "external_media_start",
"option": {
"url": "wss://your-media-server.com/audio-stream",
"encapsulation": "audiosocket"
}
}
]
}'
This creates a uni-directional stream where VoIPBIN: 1. Establishes the call 2. Connects to your media server via WebSocket 3. Receives audio from your server 4. Plays that audio to the call participant
Your media server receives:
WebSocket connection from VoIPBIN
→ Send audio chunks (PCM format for AudioSocket)
→ VoIPBIN plays audio to call
Common Use Cases
1. Real-Time Speech Recognition:
# Python example
def on_message(ws, message):
# Extract audio from RTP packet
audio = extract_audio(message)
# Send to speech recognition API
text = speech_to_text(audio)
print(f"Recognized: {text}")
# Store transcription
save_transcription(text)
2. Audio Injection / IVR Replacement:
# Node.js example
ws.on('open', function() {
// Play custom audio prompts
const prompt1 = fs.readFileSync('welcome.pcm');
ws.send(prompt1);
// Wait for DTMF or speech
// Then play next prompt
});
3. Conference Recording:
# Python example
def on_message(ws, message):
# Save all conference audio
with open(f'conference_{conference_id}.raw', 'ab') as f:
f.write(extract_audio(message))
4. Real-Time Audio Analysis:
def on_message(ws, message):
audio = extract_audio(message)
# Detect emotion
emotion = analyze_emotion(audio)
# Detect keywords
if detect_keyword(audio, ['help', 'urgent']):
alert_supervisor()
# Calculate audio quality
quality = measure_quality(audio)
5. Custom Music on Hold:
ws.on('open', function() {
// Play custom music or messages
const music = fs.readFileSync('hold_music.pcm');
// Loop music while call is on hold
setInterval(() => {
ws.send(music);
}, 1000);
});
6. AI-Powered Voice Assistant:
ws.on('message', function(data) {
// Receive customer audio
const audio = extractAudio(data);
// Send to AI for processing
const response = await aiProcess(audio);
// Convert AI response to audio
const responseAudio = textToSpeech(response);
// Send back to call
ws.send(responseAudio);
});
Audio Format Details
RTP Format: - Codec: ulaw (G.711 μ-law) - Sample rate: 8 kHz - Bits: 16-bit - Channels: Mono - Packet size: 160 bytes payload (20ms audio)
SLN Format: - Raw PCM audio - No headers or padding - Sample rate: 8 kHz - Bits: 16-bit signed - Channels: Mono
AudioSocket Format: - PCM little-endian - Sample rate: 8 kHz - Bits: 16-bit - Channels: Mono - Chunk size: 320 bytes (20ms of audio)
Best Practices
1. Buffer Management: - Maintain audio buffers to handle jitter - Send audio in consistent 20ms chunks - Don’t send too fast or too slow
2. Error Handling: - Implement reconnection logic - Handle WebSocket disconnections gracefully - Log errors for debugging
3. Audio Quality: - Use proper RTP packet construction - Maintain correct timing for audio chunks - Monitor for packet loss
4. Resource Management: - Close WebSocket when done - Don’t leave connections open indefinitely - Clean up audio buffers and files
5. Testing: - Test with various network conditions - Verify audio quality with real calls - Monitor latency and packet loss
6. Security: - Use WSS (secure WebSocket) in production - Validate authentication tokens - Encrypt sensitive audio data
Connection Lifecycle
1. Establish Connection:
GET /v1.0/calls/<call-id>/media_stream?encapsulation=rtp&token=<token>
2. WebSocket Upgrade:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
3. Bi-Directional Communication:
Client ←→ VoIPBIN
- Send audio: Binary frames with RTP packets
- Receive audio: Binary frames with RTP packets
4. Close Connection:
ws.close()
Troubleshooting
Common Issues:
No audio received: - Check WebSocket connection is established - Verify call is active and answered - Ensure correct encapsulation type
Audio quality poor: - Check network latency - Verify audio format matches requirements - Monitor packet loss
Connection drops: - Implement reconnection logic - Check firewall rules for WebSocket - Verify authentication token is valid
Can’t send audio: - Ensure binary frames are used (not text) - Verify audio format is correct - Check audio chunk size (typically 20ms)
For more information about media stream configuration, see Media Stream Overview.