Logging & Troubleshooting¶

This guide helps you monitor PyBMP operations, understand log outputs, and troubleshoot common issues.

Log Configuration¶

Log Levels¶

PyBMP supports multiple log levels for different verbosity:

Level	Description	Use Case
`DEBUG`	Very detailed information	Development and deep debugging
`INFO`	General information about operation	Normal production monitoring
`WARNING`	Warning messages about potential issues	Production with enhanced monitoring
`ERROR`	Error conditions that don't stop operation	Production (default)
`CRITICAL`	Critical errors that may stop operation	Production alerts

Setting Log Level¶

# Environment variable
export LOG_LEVEL=DEBUG

# Docker run
docker run -e LOG_LEVEL=DEBUG ...

# Docker Compose
environment:
  LOG_LEVEL: DEBUG

Log Format¶

PyBMP uses structured logging with the following format:

2024-01-15 10:30:45,123 - module_name - INFO - Message content
TIMESTAMP - LOGGER_NAME - LEVEL - MESSAGE

Log Sources¶

Main Application Logs¶

Monitor the main application container:

# Docker Compose
docker-compose logs -f bmp-collector

# Docker container
docker logs -f bmp-collector

# Follow logs with timestamps
docker-compose logs -f -t bmp-collector

Database Logs¶

Monitor TimescaleDB operations:

# Database container logs
docker-compose logs -f timescaledb

# PostgreSQL logs inside container
docker exec -it bmp-timescaledb tail -f /var/log/postgresql/postgresql-15-main.log

Understanding Log Messages¶

Normal Operation Logs¶

Startup Sequence¶

2024-01-15 10:30:45 - src.main - INFO - Starting BMP Collector...
2024-01-15 10:30:45 - src.database.connection - INFO - Database connection established
2024-01-15 10:30:46 - src.bmp.server - INFO - BMP server starting on 0.0.0.0:11019
2024-01-15 10:30:46 - src.bmp.server - INFO - BMP server listening for connections

Router Connection¶

2024-01-15 10:31:20 - src.bmp.server - INFO - New connection from 192.168.1.10:51234
2024-01-15 10:31:21 - src.bmp.processor - INFO - Router 192.168.1.10 session started
2024-01-15 10:31:21 - src.bmp.processor - INFO - Processing initiation message from 192.168.1.10

Route Processing¶

2024-01-15 10:31:25 - src.bmp.processor - INFO - Processed 150 route updates from 192.168.1.10
2024-01-15 10:31:30 - src.bmp.processor - DEBUG - Route announcement: 10.0.0.0/8 via 192.168.1.1
2024-01-15 10:31:35 - src.bmp.processor - DEBUG - Route withdrawal: 172.16.0.0/12

Warning Messages¶

Buffer Management¶

2024-01-15 10:35:00 - src.bmp.server - WARNING - High buffer usage for connection 192.168.1.10 (85%)
2024-01-15 10:35:30 - src.bmp.processor - WARNING - Batch processing delayed, queue size: 500

Database Performance¶

2024-01-15 10:40:00 - src.database.connection - WARNING - Database query slow: 1.5s
2024-01-15 10:40:15 - src.database.connection - WARNING - Connection pool near capacity: 18/20

Error Messages¶

Connection Issues¶

2024-01-15 10:45:00 - src.bmp.server - ERROR - Connection lost from 192.168.1.10: [Errno 104] Connection reset by peer
2024-01-15 10:45:10 - src.database.connection - ERROR - Database connection failed: connection timeout

Data Processing Errors¶

2024-01-15 10:50:00 - src.bmp.parser - ERROR - Invalid BMP message format from 192.168.1.10
2024-01-15 10:50:05 - src.bmp.bgp_parser - ERROR - Malformed BGP update in BMP message

Monitoring with Prometheus Metrics¶

Key Metrics to Monitor¶

Access metrics at http://localhost:9090/metrics:

Connection Metrics¶

# Active connections
bmp_active_connections{router_ip="192.168.1.10"} 1

# Total messages received
bmp_messages_received_total{router_ip="192.168.1.10",message_type="route_monitoring"} 1500

# Message processing rate
bmp_message_processing_rate{router_ip="192.168.1.10"} 25.5

Database Metrics¶

# Database queries executed
database_queries_total{operation="insert"} 5000

# Database connection pool usage
database_pool_connections{state="active"} 8
database_pool_connections{state="idle"} 12

# Query execution time
database_query_duration_seconds{operation="insert",quantile="0.95"} 0.05

System Metrics¶

# Memory usage
process_resident_memory_bytes 256000000

# CPU usage
process_cpu_seconds_total 120.5

# File descriptors
process_open_fds 45

Setting Up Alerts¶

Example Prometheus alerting rules:

groups:
  - name: pybmp_alerts
    rules:
      - alert: BMPHighConnectionLoss
        expr: rate(bmp_connections_lost_total[5m]) > 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High BMP connection loss rate"
          description: "BMP connection loss rate is {{ $value }} connections/sec"

      - alert: BMPDatabaseSlow
        expr: database_query_duration_seconds{quantile="0.95"} > 1.0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Database queries are slow"
          description: "95th percentile query time is {{ $value }} seconds"

      - alert: BMPHighMemoryUsage
        expr: process_resident_memory_bytes > 1000000000
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage"
          description: "PyBMP using {{ $value | humanize1024 }}B of memory"

Common Issues and Solutions¶

1. Router Connection Issues¶

Symptoms¶

ERROR - Connection lost from 192.168.1.10: [Errno 104] Connection reset by peer
WARNING - No data received from 192.168.1.10 for 60 seconds

Possible Causes & Solutions¶

Network Connectivity:

# Test connectivity from PyBMP host to router
telnet 192.168.1.10 179  # BGP port should be open
ping 192.168.1.10

# Check firewall rules
iptables -L -n | grep 11019
ufw status

Router Configuration:

# Verify router BMP configuration
# Check if router can reach PyBMP
show ip route <PYBMP-IP>
show bmp summary

PyBMP Configuration:

# Check if PyBMP is listening on correct interface
netstat -tulpn | grep 11019
ss -tulpn | grep 11019

# Verify container networking (if using Docker)
docker port bmp-collector

2. Database Performance Issues¶

Symptoms¶

WARNING - Database query slow: 2.5s
ERROR - Database connection timeout
WARNING - Connection pool near capacity: 20/20

Solutions¶

Optimize Database:

-- Update table statistics
ANALYZE routes;
ANALYZE route_history;

-- Check for missing indexes
SELECT schemaname, tablename, attname, n_distinct, correlation
FROM pg_stats
WHERE schemaname = 'public' AND tablename = 'routes';

-- Add custom indexes for frequent queries
CREATE INDEX CONCURRENTLY idx_routes_recent_by_router
ON routes(router_ip, time DESC)
WHERE time > NOW() - INTERVAL '7 days';

Tune Database Configuration:

-- postgresql.conf
max_connections = 200
shared_buffers = 2GB
work_mem = 8MB
effective_cache_size = 8GB
checkpoint_timeout = 10min
wal_buffers = 16MB

Adjust Connection Pool:

# Increase database pool size
DB_POOL_SIZE=30
DB_MAX_OVERFLOW=60

# Adjust batch processing
BATCH_SIZE=500
BATCH_TIMEOUT_SECONDS=10

3. High Memory Usage¶

Symptoms¶

WARNING - High memory usage detected: 1.2GB
ERROR - Out of memory: cannot allocate more connections

Solutions¶

Monitor Memory Usage:

# Container memory usage
docker stats bmp-collector

# Detailed memory breakdown
docker exec bmp-collector cat /proc/meminfo
docker exec bmp-collector ps aux --sort=-%mem

Optimize Configuration:

# Reduce buffer sizes
BMP_BUFFER_SIZE=32768
BMP_MAX_CONNECTIONS=50

# Limit batch sizes
BATCH_SIZE=500
WORKER_THREADS=2

Database Memory:

-- Check database cache hit ratio
SELECT datname,
       blks_read,
       blks_hit,
       round((blks_hit::float/(blks_read+blks_hit+1)*100)::numeric, 2) as cache_hit_ratio
FROM pg_stat_database
WHERE datname = 'bmp_collector';

4. Message Processing Delays¶

Symptoms¶

WARNING - Batch processing delayed, queue size: 1000
INFO - Processing lag: 30 seconds behind real-time

Solutions¶

Increase Processing Capacity:

# More worker threads
WORKER_THREADS=8

# Larger batch sizes (if database can handle)
BATCH_SIZE=2000

# Shorter timeout for faster processing
BATCH_TIMEOUT_SECONDS=2

Database Optimization:

-- Partition old data
SELECT drop_chunks('routes', INTERVAL '30 days');

-- Update chunk interval for better performance
SELECT set_chunk_time_interval('routes', INTERVAL '6 hours');

5. Log File Issues¶

Log Rotation¶

# Setup logrotate for Docker logs
cat > /etc/logrotate.d/docker-pybmp << EOF
/var/lib/docker/containers/*/*.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
    create 0644 root root
    postrotate
        docker kill --signal=USR1 $(docker ps -q)
    endscript
}
EOF

Disk Space Monitoring¶

# Monitor log directory size
du -sh /var/lib/docker/containers/*/

# Clean old logs
docker system prune -f
docker container prune -f

Debugging Tools¶

Enable Debug Logging¶

# Temporarily enable debug logging
docker exec bmp-collector kill -USR1 1

# Or restart with debug logging
docker-compose down
docker-compose up -d -e LOG_LEVEL=DEBUG

Database Query Analysis¶

-- Show slow queries
SELECT query, mean_time, calls, total_time
FROM pg_stat_statements
WHERE mean_time > 100
ORDER BY mean_time DESC;

-- Show current running queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes';

Network Packet Capture¶

# Capture BMP traffic for analysis
tcpdump -i any -w bmp_capture.pcap port 11019

# Analyze with tshark
tshark -r bmp_capture.pcap -Y "tcp.port == 11019"

Resource Monitoring¶

# Real-time monitoring
watch -n 1 'docker stats --no-stream bmp-collector'

# Continuous monitoring with timestamps
while true; do
    echo "$(date): $(docker stats --no-stream --format 'table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}' bmp-collector)"
    sleep 60
done

Performance Tuning¶

Application Tuning¶

# Optimize for high-volume environments
BMP_BUFFER_SIZE=131072
BMP_MAX_CONNECTIONS=200
BATCH_SIZE=5000
BATCH_TIMEOUT_SECONDS=1
WORKER_THREADS=16

Database Tuning¶

-- Optimize for time-series workload
SET maintenance_work_mem = '1GB';
SET checkpoint_timeout = '15min';
SET wal_compression = on;
SET log_min_duration_statement = 1000;

Container Resource Limits¶

# docker-compose.yml
services:
  bmp-collector:
    deploy:
      resources:
        limits:
          memory: 2G
          cpus: '2.0'
        reservations:
          memory: 1G
          cpus: '1.0'

Getting Help¶

When reporting issues, include:

Log snippets with timestamps
Configuration (sanitized)
System information (OS, Docker version, hardware)
Network topology and router models
Metrics data if available

# Generate diagnostic report
docker-compose logs --tail=100 > pybmp-logs.txt
docker stats --no-stream > pybmp-stats.txt
curl -s http://localhost:9090/metrics > pybmp-metrics.txt