Backend Common Issues and Scenarios Deck Flashcards
Be able to diagnose backend issues based on scenarios (12 cards)
Sudden Response Time Degradation
Symptom: Response times increased from 200ms to 2-3 seconds over past weeks
Top Causes to Check:
Database query performance - Check slow query logs, missing indexes, query plan changes
Memory leaks - Monitor heap usage, garbage collection frequency
Connection pool exhaustion - Check active connections vs pool size
Third-party API slowdowns - Test external service response times
Investigation Steps:
Check database performance metrics first
Monitor memory usage patterns
Verify connection pool health
Test external dependencies
Business Hours Performance Spikes
Symptom: System slow during 9 AM - 6 PM, fine at night
Top Causes to Check:
Increased concurrent users - Traffic volume analysis
Database lock contention - Check for blocking queries during peak hours
CPU/Memory resource exhaustion - Monitor resource utilization
Cache hit ratio degradation - Check cache effectiveness
Investigation Steps:
Compare traffic patterns (concurrent users, requests/sec)
Check resource utilization during peak vs off-peak
Analyze database lock wait times
Review cache hit ratios
Gradual Performance Deterioration
Symptom: Performance slowly degrading over weeks/months
Top Causes to Check:
Database table growth - Large tables without proper indexing
Memory leaks - Gradual memory consumption increase
Log file growth - Disk space affecting performance
Database fragmentation - Tables needing optimization
Investigation Steps:
Check database table sizes and growth patterns
Monitor memory usage trends over time
Verify disk space and I/O performance
Review database maintenance schedules
Database Connection Issues
Symptom: Intermittent connection timeouts, connection errors
Top Causes to Check:
Connection pool misconfiguration - Pool size too small for load
Database max connections limit - Hitting database connection limits
Network issues - Packet loss, latency between app and DB
Long-running transactions - Blocking connections
Investigation Steps:
Check connection pool settings vs actual usage
Monitor database connection count vs limits
Test network connectivity and latency
Identify long-running or blocked transactions
Query Performance Issues
Symptom: Specific database operations taking too long
Top Causes to Check:
Missing or inefficient indexes - Queries doing full table scans
Query plan regression - Database optimizer choosing poor plans
Table locking/blocking - Concurrent operations blocking each other
Database statistics outdated - Optimizer using stale data
Investigation Steps:
Analyze query execution plans
Check for missing indexes on frequently queried columns
Monitor for blocking processes
Update database statistics
Database Resource Exhaustion
Symptom: Database running out of CPU, memory, or disk space
Top Causes to Check:
Inefficient queries consuming resources - Resource-intensive operations
Database buffer pool issues - Insufficient memory allocation
Disk I/O bottlenecks - Storage performance limits
Backup/maintenance operations - Scheduled tasks affecting performance
Investigation Steps:
Identify top resource-consuming queries
Check database memory allocation
Monitor disk I/O patterns and performance
Review maintenance operation schedules
API Response Time Inconsistency
Symptom: API returning inconsistent response times or results
Top Causes to Check:
Load balancer misconfiguration - Uneven traffic distribution
Instance-specific issues - One server performing poorly
Circuit breaker activation - Downstream service failures
Rate limiting activation - Hitting API rate limits
Investigation Steps:
Check load balancer health and distribution
Monitor individual instance performance
Review circuit breaker status and logs
Verify rate limiting thresholds
API Gateway Problems
Symptom: Issues at the API gateway level affecting all routes
Top Causes to Check:
Gateway resource limits - CPU/memory exhaustion at gateway
Authentication/authorization delays - Auth service slowdowns
Request transformation overhead - Complex request/response processing
Upstream service discovery issues - Gateway can’t find healthy instances
Investigation Steps:
Monitor gateway resource utilization
Check authentication service performance
Review request transformation logic complexity
Verify service discovery and health checks
Card 9: Traffic Spike Handling
Symptom: System crashes or degrades during high traffic periods
Top Causes to Check:
Auto-scaling not configured - Fixed capacity can’t handle load
Database connection limits - Backend can’t scale with traffic
Memory/CPU resource limits - Hardware constraints
Circuit breakers not configured - Cascading failures
Investigation Steps:
Check auto-scaling configuration and triggers
Monitor resource utilization during spikes
Verify database connection scaling
Review circuit breaker and failover mechanisms
Load Balancer Issues
Symptom: Uneven performance, some requests fast, others slow
Top Causes to Check:
Unhealthy backend instances - Some servers not responding properly
Session affinity problems - Sticky sessions causing imbalance
Health check configuration - Incorrect health check removing good instances
Load balancing algorithm - Algorithm not suitable for workload
Investigation Steps:
Check health status of all backend instances
Review session affinity settings
Verify health check configuration
Analyze traffic distribution patterns
Batch Job Performance Degradation
Symptom: Batch jobs taking much longer than before (2 hours → 8+ hours)
Top Causes to Check:
Data volume growth - Processing larger datasets with same resources
Resource contention - Competing with other processes for resources
Database locks during processing - Blocking other operations
Memory/disk space issues - Insufficient resources for processing
Investigation Steps:
Compare current vs historical data volumes
Check resource utilization during batch processing
Monitor for database locks and blocking
Verify available memory and disk space
Real-time Data Processing Issues
Symptom: Data processing lag, delayed updates, backlog buildup
Top Causes to Check:
Message queue backlog - Producers faster than consumers
Worker process failures - Processing instances crashing/failing
Resource constraints - CPU/memory limiting processing speed
Downstream service delays - Dependencies slowing processing
Investigation Steps:
Monitor message queue depths and processing rates
Check worker process health and error rates
Review resource utilization of processing instances
Test downstream service response times