Some of you noticed Wikidot was down yestarday for more than two hour. This is not something that happens often — previous serious outage we had was in November 2014.
Till now S3 has never failed before to this extent. It sure was not a small failure and it took down several other Amazon services. It also affected all services and websites that use Amazon Web Services. It's not just Wikidot alone: Trello, Travis CI, GitHub and GitLab, Quora, Medium, Signal, Slack, Imgur, Twitch.tv, Razer, Apple's iCloud and several other websites could not function properly (or were not reachable at all). A significant percentage of websites all over the world relies on S3 and only now we learned what happens when it's down.
The issue was so severe that even Amazon could not update their status board to let us know about the problems. It was probably hosted on S3 as well… It looks like engineers simply assumed that S3 would be available no matter what.
Wikidot infrastructure design relies on a certain assumption about S3 as well. We simply assumed everything can break, but not S3 itself. Even our backup site (in case databases and servers fail) is hosted on S3.
I guess today several admins and developers (especially from services affected by the S3 outage) try to find a way to loosen their dependencies on S3 and protect their services against similar events. We are going to look at this too — the data you keep with us is our top priority.
Thanks for your understanding and I am sorry for any trouble our outage might have caused.
Michal and the Wikidot Team