@MDesign: it is difficult to put in words how sorry I am about the whole thing. We realize that every minute someone depends on Wikidot with something important. And I know that every minute Wikidot is up and running we make someones job possible, but when it is down we fail someone.
This indeed puts a pressure on us, and literally with everything we do we put stability and reliability in the very first place. Far before implementing new features, bells and whistles.
I learned it even in early days of Wikidot, when we had a longer outage. One of our clients (a working group in a bank) called and said they have an important meeting and they need access to their wiki really badly.
Now I also know what was the problem with RoaringApps that day and I wish we realized this on-time. When users were adding apps to their docks (by adding their unique tag to the app-page), as a side effect all cached ListPages that included pages from the app category were devalidated. Which means that all tables, grids listing apps needed to be re-generated very often — which, taking traffic into account, was creating overload. Once the the load passes a certain threshold, other things start failing and increase the load on application backend even more, blurring the whole situation.