Data Architecture¶

Note

AI Context

This page describes VoIPBIN’s data layer: shared MySQL database (schema organization, common table patterns, migrations via Alembic), Redis cache (cache-aside pattern, key naming, TTL strategies), and session management. Relevant when an AI agent needs to understand database schema conventions, caching strategies, data consistency models, or backup/recovery procedures.

VoIPBIN uses a shared data layer with MySQL for persistent storage and Redis for caching and session management. This architecture provides consistency across services while enabling high-performance data access.

Data Layer Overview¶

VoIPBIN’s data architecture consists of three layers:

Data Architecture:

+---------------------------------------------------------+
|                   Application Layer                     |
|            (34 Microservices)                           |
+--------------------+-------------------+----------------+
                     |                   |
                     |                   |
     +---------------v------+   +--------v-----------+
     |                      |   |                    |
     |   Redis Cache        |   |   MySQL Database   |
     |   (Hot Data)         |   |   (Persistent)     |
     |                      |   |                    |
     |  o Sessions          |   |  o All entities    |
     |  o Frequently read   |   |  o Relationships   |
     |  o Temporary data    |   |  o Audit logs      |
     |                      |   |                    |
     +----------------------+   +--------------------+

     Cache-Aside Pattern:
     1. Check cache first
     2. If miss, query database
     3. Store in cache for next time

MySQL Database¶

VoIPBIN uses a single shared MySQL database accessed by all services.

Database Characteristics

Shared Database Pattern:

+--------------+  +--------------+  +--------------+
|   Service A  |  |   Service B  |  |   Service C  |
|              |  |              |  |              |
|  call-mgr    |  |  flow-mgr    |  |  agent-mgr   |
+------+-------+  +------+-------+  +------+-------+
       |                 |                 |
       |   Connection    |                 |
       |   Pooling       |                 |
       +--------+--------+-----------------+
                |
                v
     +----------------------------+
     |      MySQL Database        |
     |                            |
     |  +----------------------+  |
     |  |  calls table         |  |
     |  |  conferences table   |  |
     |  |  agents table        |  |
     |  |  flows table         |  |
     |  |  customers table     |  |
     |  |  ... 100+ tables     |  |
     |  +----------------------+  |
     +----------------------------+

Shared Schema: All services access same database
Logical Separation: Services own specific tables
ACID Transactions: Strong consistency guarantees
Connection Pooling: Each service maintains pool

Schema Organization

Tables are logically grouped by domain:

Table Organization:

Communication Domain:
o calls                - Call records
o conferences          - Conference bridges
o sms                  - SMS messages
o chats                - Chat messages
o emails               - Email records

Workflow Domain:
o flows                - Call flow definitions
o flow_actions         - Flow action steps
o queues               - Call queues
o campaigns            - Campaign definitions

Management Domain:
o customers            - Customer accounts
o agents               - Agent records
o billings             - Billing records
o webhooks             - Webhook configurations
o accesskeys           - API keys

Resource Domain:
o numbers              - Phone numbers
o recordings           - Call recordings
o transcribes          - Transcription jobs
o transcripts          - Transcript segments

Common Table Pattern

All tables follow a consistent structure:

Standard Table Schema:

CREATE TABLE resource (
    id              VARCHAR(36) PRIMARY KEY,    -- UUID
    customer_id     VARCHAR(36) NOT NULL,       -- Ownership

    -- Resource-specific fields
    name            VARCHAR(255),
    status          VARCHAR(50),
    detail          TEXT,

    -- Timestamps
    tm_create       DATETIME(6) NOT NULL,       -- Creation time
    tm_update       DATETIME(6) NOT NULL,       -- Last update
    tm_delete       DATETIME(6) NOT NULL,       -- Soft delete

    -- Indexes
    INDEX idx_customer (customer_id),
    INDEX idx_status (status),
    INDEX idx_tm_create (tm_create),
    INDEX idx_tm_delete (tm_delete)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Key Design Patterns:

UUID Primary Keys: Globally unique identifiers
Customer Ownership: Every resource has customer_id
Soft Deletes: tm_delete = ‘9999-01-01’ for active records
Microsecond Timestamps: DATETIME(6) for precise ordering
UTF8MB4: Full Unicode support including emojis

Data Access Patterns

Services access data through consistent patterns:

Data Access Flow:

Service Handler
     |
     |  1. Validate Input
     v
+----------------------+
|  Business Logic      |
+------+---------------+
       |  2. Check Cache
       v
+----------------------+
|  Cache Handler       |
|  (Redis)             |
+------+---------------+
       |  Cache Miss
       |  3. Query DB
       v
+----------------------+
|  DB Handler          |
|  (MySQL)             |
+------+---------------+
       |  4. Store in Cache
       v
+----------------------+
|  Return Result       |
+----------------------+

Transaction Handling

VoIPBIN uses transactions for consistency:

Transaction Example:

BEGIN TRANSACTION
    |
    |  1. Create Call Record
    +--> INSERT INTO calls ...
    |
    |  2. Update Customer Stats
    +--> UPDATE customers SET total_calls = total_calls + 1 ...
    |
    |  3. Create Billing Entry
    +--> INSERT INTO billings ...
    |
    |  If all succeed:
    |    COMMIT
    |  If any fails:
    |    ROLLBACK
    |
END TRANSACTION

ACID Guarantees: Atomic, Consistent, Isolated, Durable
Rollback on Error: All changes reverted if any step fails
Isolation Levels: READ COMMITTED for most operations
Lock Timeout: 30 seconds to prevent deadlocks

Query Optimization

VoIPBIN optimizes queries for performance:

Query Optimization Strategies:

1. Proper Indexing:
   +---------------------------------+
   | INDEX idx_customer_status       |
   | ON calls (customer_id, status)  |
   +---------------------------------+

   SELECT * FROM calls
   WHERE customer_id = ? AND status = 'active'
   -> Uses index, fast lookup

2. Avoid SELECT *:
   +---------------------------------+
   | SELECT id, status, tm_create    |
   | FROM calls WHERE ...            |
   +---------------------------------+
   -> Only retrieve needed columns

3. Pagination:
   +---------------------------------+
   | SELECT * FROM calls             |
   | WHERE customer_id = ?           |
   | LIMIT 50 OFFSET 0               |
   +---------------------------------+
   -> Limit result size

4. Connection Pooling:
   +---------------------------------+
   | Pool Size: 10-50 connections    |
   | Max Idle: 5 minutes             |
   | Max Lifetime: 1 hour            |
   +---------------------------------+
   -> Reuse connections

Note

AI Implementation Hint

All database schema changes must go through Alembic migrations in bin-dbscheme-manager. Never modify the database schema directly. All records use UUID primary keys and soft deletes (tm_delete set to 9999-01-01 for active records). When querying via the API, deleted records are automatically excluded from results.

Database Migrations¶

Schema changes are managed through Alembic migrations:

Migration Workflow:

Development                 Migration Script              Production
     |                            |                           |
     |  1. Schema Change          |                           |
     |     Needed                 |                           |
     v                            |                           |
+-------------+                   |                           |
| Create      |                   |                           |
| Migration   |------------------>|                           |
| Script      |                   |                           |
+-------------+                   |                           |
     |                            |                           |
     |  2. Test Locally           |                           |
     v                            |                           |
+-------------+                   |                           |
| Run         |                   |                           |
| Migration   |<------------------|                           |
| (dev DB)    |                   |                           |
+-------------+                   |                           |
     |                            |                           |
     |  3. Commit to Git          |                           |
     v                            |                           |
+-------------+                   |                           |
| Code Review |                   |                           |
| & Approval  |                   |                           |
+-------------+                   |                           |
     |                            |                           |
     |  4. Deploy                 |                           |
     |                            |  5. Manual Execution      |
     |                            |     (by human)            |
     |                            +-------------------------->>
     |                            |                           |
     |                            |  alembic upgrade head     |
     |                            |                           |

Migration Best Practices:

Version Control: All migrations in git
Forward Only: Never modify existing migrations
Backward Compatible: Support gradual rollout
Manual Execution: Humans run migrations, not automation
Testing: Test on staging before production

Redis Cache¶

Redis provides fast access to frequently used data:

Cache Architecture

Redis Cache Pattern:

Application Request
     |
     |  1. Generate Cache Key
     |     key = "call:123"
     v
+--------------------+
|  Check Redis       |
|  GET call:123      |
+----+---------------+
     |
     +- Cache Hit --------+
     |                    |
     |                    v
     |              +----------------+
     |              | Return Cached  |
     |              | Data (fast)    |
     |              +----------------+
     |
     +- Cache Miss -------+
     |                    |
     |                    v
     |              +----------------+
     |              | Query MySQL    |
     |              +----+-----------+
     |                   |
     |                   v
     |              +----------------+
     |              | Store in Redis |
     |              | SET call:123   |
     |              | EX 300 (5 min) |
     |              +----+-----------+
     |                   |
     |                   v
     |              +----------------+
     |              | Return Data    |
     |              +----------------+

Cache Key Patterns

VoIPBIN uses structured cache keys:

Key Naming Convention:
<resource>:<id>[:<field>]

Examples:
o call:abc-123              -> Full call record
o agent:xyz-789:status      -> Agent status only
o customer:customer-456     -> Customer record
o queue:queue-999:stats     -> Queue statistics
o flow:flow-111:definition  -> Flow definition

Advantages:
o Predictable keys
o Easy to invalidate
o Pattern matching for bulk operations

Data Structures

Redis supports multiple data structures:

Redis Data Structures:

1. String (Simple Values):
   SET call:123:status "active"
   GET call:123:status
   -> "active"

2. Hash (Object Fields):
   HSET call:123 status "active" duration "120"
   HGET call:123 status
   -> "active"
   HGETALL call:123
   -> {"status": "active", "duration": "120"}

3. List (Ordered Collection):
   LPUSH queue:456:waiting call:123
   LPUSH queue:456:waiting call:789
   LRANGE queue:456:waiting 0 -1
   -> [call:789, call:123]

4. Set (Unique Collection):
   SADD conference:999:participants agent:111
   SADD conference:999:participants agent:222
   SMEMBERS conference:999:participants
   -> [agent:111, agent:222]

5. Sorted Set (Scored Collection):
   ZADD leaderboard 100 agent:111
   ZADD leaderboard 95 agent:222
   ZRANGE leaderboard 0 -1 WITHSCORES
   -> [(agent:111, 100), (agent:222, 95)]

Cache Expiration

All cached data has Time-To-Live (TTL):

TTL Strategy:

Data Type              TTL        Reason
─────────────────────────────────────────────
Session tokens         1 hour     Security
User profiles          5 min      Frequently updated
Call records           1 min      Real-time changes
Configuration          1 hour     Rarely changes
Static data            24 hours   Almost never changes

Set TTL:
SET key value EX 300   # 5 minutes
SETEX key 300 value    # Same as above
EXPIRE key 300         # Set TTL on existing key

Cache Invalidation

VoIPBIN invalidates cache on updates:

Cache Invalidation Flow:

Update Request
     |
     |  1. Update Database
     v
+--------------------+
|  UPDATE calls      |
|  SET status='ended'|
|  WHERE id='123'    |
+----+---------------+
     |
     |  2. Invalidate Cache
     v
+--------------------+
|  DEL call:123      |
+----+---------------+
     |
     |  3. Return Success
     v
+--------------------+
|  Response to Client|
+--------------------+

Next Read:
o Cache miss
o Fetch from DB
o Store in cache with new data

Cache Patterns

Common Cache Patterns:

1. Cache-Aside (Read Through):
   App checks cache -> Cache miss -> Query DB -> Store in cache

2. Write-Through:
   App writes to cache -> Cache writes to DB -> Return success

3. Write-Behind (Async):
   App writes to cache -> Return success -> Cache writes to DB later

VoIPBIN primarily uses Cache-Aside for simplicity and consistency.

Session Management¶

Redis stores session data for authenticated users:

Session Structure

Session Data in Redis:

Key: session:<token-hash>
Type: Hash
TTL: 1 hour (refreshed on activity)

Data:
+-------------------------------------+
| customer_id    : customer-123       |
| agent_id       : agent-456          |
| permissions    : ["admin", "call"]  |
| login_time     : 2026-01-20 12:00   |
| last_activity  : 2026-01-20 12:30   |
| ip_address     : 192.168.1.100      |
| user_agent     : Mozilla/5.0 ...    |
+-------------------------------------+

Session Lifecycle

Session Flow:

1. Login:
   +----------------------------+
   | Generate JWT token         |
   | Hash token -> session_key  |
   | Store session in Redis     |
   | SET session:xyz {...}      |
   | EXPIRE session:xyz 3600    |
   +----------------------------+

2. Request:
   +----------------------------+
   | Extract token from header  |
   | Hash token -> session_key  |
   | GET session:xyz            |
   | Validate session data      |
   | EXPIRE session:xyz 3600    |  <- Refresh TTL
   +----------------------------+

3. Logout:
   +----------------------------+
   | Extract token from header  |
   | Hash token -> session_key  |
   | DEL session:xyz            |
   +----------------------------+

Data Consistency¶

VoIPBIN ensures consistency across data layers:

Consistency Model

Consistency Strategy:

Strong Consistency:        Eventual Consistency:
+--------------+           +--------------+
|   MySQL      |           |   Redis      |
|  (Source of  |           |  (May be     |
|   Truth)     |           |   stale)     |
+------+-------+           +------+-------+
       |                          |
       |  Always consistent       |  May lag behind
       |  ACID transactions       |  Best effort
       |                          |
       +----------+---------------+
                  |
            Database is authoritative

Write Path

Write Flow (Strong Consistency):

1. Write Request
   |
   v
2. Update Database First
   +- BEGIN TRANSACTION
   +- UPDATE table ...
   +- COMMIT
   |
   v
3. Invalidate Cache
   +- DEL cache_key
   |
   v
4. Publish Event
   +- Notify subscribers
   |
   v
5. Return Success

Database updated before cache invalidation
ensures consistency.

Read Path

Read Flow (Eventual Consistency Acceptable):

1. Read Request
   |
   v
2. Check Cache
   +- Cache Hit -> Return (may be slightly stale)
   +- Cache Miss -> Continue
   |
   v
3. Query Database
   +- SELECT * FROM table WHERE ...
   |
   v
4. Store in Cache
   +- SET cache_key value EX ttl
   |
   v
5. Return Result

Data Backup and Recovery¶

VoIPBIN implements comprehensive backup strategy:

Backup Architecture

Backup Strategy:

Production Database
     |
     |  Continuous Replication
     v
+--------------------+
|  Read Replica      |  <- Used for backups
+----+---------------+    (no production impact)
     |
     |  Daily Full Backup
     v
+--------------------+
|  Backup Storage    |
|  (Google Cloud)    |
|                    |
|  o Daily: 30 days  |
|  o Weekly: 1 year  |
|  o Monthly: 7 years|
+--------------------+

Backup Schedule

Backup Timeline:

Daily (3 AM UTC):
+------------------------------+
| Full database dump           |
| Stored for 30 days           |
| ~100 GB compressed           |
+------------------------------+

Weekly (Sunday 3 AM):
+------------------------------+
| Full database dump           |
| Stored for 1 year            |
| Long-term retention          |
+------------------------------+

Continuous:
+------------------------------+
| Binary logs (point-in-time)  |
| Stored for 7 days            |
| For recovery between backups |
+------------------------------+

Recovery Procedures

Recovery Scenarios:

1. Recent Data Loss (< 7 days):
   +----------------------------+
   | Restore latest daily backup|
   | Apply binary logs          |
   | Point-in-time recovery     |
   +----------------------------+
   Recovery time: 1-2 hours

2. Older Data Loss (< 1 year):
   +----------------------------+
   | Restore weekly backup      |
   | No binary logs available   |
   +----------------------------+
   Recovery time: 2-4 hours

3. Disaster Recovery:
   +----------------------------+
   | Failover to replica        |
   | Promote to primary         |
   | Restore from backup        |
   +----------------------------+
   Recovery time: 15 minutes

Performance Monitoring¶

VoIPBIN monitors data layer performance:

Database Metrics

Key Database Metrics:

Query Performance:
+-------------------------------------+
| Slow queries (> 1 second): 0.1%     |
| Average query time: 5ms             |
| P95 query time: 50ms                |
| P99 query time: 200ms               |
+-------------------------------------+

Connection Pool:
+-------------------------------------+
| Active connections: 45/50           |
| Idle connections: 5/50              |
| Wait time: < 1ms                    |
+-------------------------------------+

Table Size:
+-------------------------------------+
| calls:        50 million rows       |
| conferences:  5 million rows        |
| agents:       10,000 rows           |
| Total size:   500 GB                |
+-------------------------------------+

Cache Metrics

Redis Performance:

Hit Rate:
+-------------------------------------+
| Cache hits:   95%                   |
| Cache misses: 5%                    |
| Target:       > 90%                 |
+-------------------------------------+

Memory Usage:
+-------------------------------------+
| Used memory: 8 GB / 16 GB           |
| Peak memory: 12 GB                  |
| Eviction:    LRU policy             |
+-------------------------------------+

Latency:
+-------------------------------------+
| P50: 0.5ms                          |
| P95: 2ms                            |
| P99: 5ms                            |
+-------------------------------------+

Scalability Considerations¶

As VoIPBIN scales, data layer adapts:

Database Scaling

Scaling Strategy:

Current (< 1M customers):
+--------------------------+
|   Single Primary         |
|   + Read Replicas (3)    |
+--------------------------+

Future (> 1M customers):
+--------------------------+
|   Sharding by Customer   |
|                          |
|   Shard 1: customers A-M |
|   Shard 2: customers N-Z |
+--------------------------+

Cache Scaling

Redis Scaling:

Current:
+--------------------------+
|  Single Redis Instance   |
|  16 GB Memory            |
+--------------------------+

Future:
+--------------------------+
|  Redis Cluster           |
|  o Multiple nodes        |
|  o Automatic sharding    |
|  o High availability     |
+--------------------------+

Best Practices¶

Database:

Use indexes for all WHERE clauses
Avoid SELECT *, specify columns
Use connection pooling
Set appropriate timeouts
Monitor slow queries
Regular ANALYZE TABLE for statistics

Cache:

Set appropriate TTLs
Invalidate on updates
Use structured keys
Monitor hit rates
Handle cache failures gracefully
Don’t store large objects (> 1MB)

Security:

Use parameterized queries (prevent SQL injection)
Encrypt sensitive data at rest
Use SSL/TLS for connections
Rotate database credentials regularly
Audit database access
Restrict network access

Monitoring:

Track query performance
Monitor connection pool utilization
Alert on cache hit rate < 90%
Alert on slow queries
Monitor disk space
Track replication lag