Last Friday I started drafting a new blog post about Wikidot server uptime. Every month Pingdom sends us a monthly report with average response times, uptime summaries for various checks we have. The numbers are always stellar — something around 99.99% uptime with only a few minutes of detected outages.
Friday night, a thriller story
Our infrastructure is pretty stable and resilient, so I was hoping to get 100% uptime on November. Suddenly on Friday evening we started getting various alerts, from Pingdom, CloudWatch and other services. Something was wrong. Wikidot was slow — loading a page could take up to a minute, sometimes pages could not load at all. Pingdom alerts acted like crazy — various wikis went down and up radomly.