Internal reference document — depth 5
Replica synchronisation uses a leader-follower model; reads are served from any node.
The batch job runs at 03:00 UTC and archives records beyond the retention window.
Circuit breaker opens after 50 % error rate over a 10-second sliding window.
Health checks fire every 10 seconds; three failures mark the instance unhealthy.
Prometheus metrics are scraped every 15 seconds and retained for 30 days.
Snapshots are taken hourly and replicated to the secondary availability zone.