Over the last days we've seen problems as many people came to http://snowleopard.wikidot.com looking for information on Apple's upcoming Mac OS X release. Today I can explain in more detail why these problems happened, and how we're going to fix them.
First, you'll see that Wikidot is now stable and reasonably fast (though it could be faster, and will be once we're done). The Snow Leopard site is not the only high-volume site, there are several popular ones, like http://fretsonfire.wikidot.com.
We moved the Snow Leopard page to a static HTML page, so that anonymous visitors see that page, while logged-in users see the 'real' page. This cuts traffic going to the Wikidot engine. You can see this page if you log out.
The problem with this particular page is the number of people editing it, and requesting it at the same time. What that means is that after each edit, the cache is hit by hundreds or thousands of refresh requests in parallel. This creates a backlog that slows down the whole engine, so more and more new requests pile-up in the queue. It takes longer and longer to process each request, so the queues go from 30-50 (normal) to 200-500, at which point everything starts to fail. This is poor design in the cache, which should not render the new page more than once. But it's not something that can be fixed easily.
Instead of changing the way we cache and render pages, our new architecture adds a new front layer of static HTML that mirrors every single page. It serves these pages to anonymous users, without ever hitting the cache and the Wikidot engine core. These mirror will refresh asynchronously, obviously when the page is edited, but also regularly so that dynamic content (from modules) is refreshed. Anonymous users will see a site that is always a minute or so out of date. Logged-in users will see the latest revision, exactly as now.
The main benefit of this design is that the bulk of high-volume traffic will be handled with no stress at all to the Wikidot core. Moving that single Snow Leopard page to static HTML already cut the CPU loading by 35-40%, and doing this for all sites will produce an even more dramatic change. The result will be, if we get this right, that editing and reloading dynamic content will be much faster for logged-in users than it is now.
You can consider the current workaround for the Snow Leopard page (which we had to construct in some haste last night, as Wikidot continued to crash, leading us to realize the cache was thrashing) as a prototype for this new architecture.
Two additional benefits: we can scale out the static HTML mirrors to any size, meaning that we'll be able to handle millions of hits per day, or per hour, if necessary. And secondly, we'll have independent static HTML mirrors of every site so that if the Wikidot engine does die, for other reasons, logged-in users can get the static HTML mirror as well. Even if the Wikidot engine takes too long to respond, the front-end can switch to the static HTML mirror and add a suitable message. Proper failover, and scalability.
It will take us some weeks to completely work through the implications of this. I'll keep you informed of our progress.
Hi Pieter,
Glad you were able to devise a solution to the traffic problems.
One question- could the static page that is displayed to anonymous visitors be updated to reflect the most recent version? I ask because quite a few changes have been made since last night, including reverting vandalism that changed some of my Amazon affiliate links.
Thanks for all your help,
Dan
We'll do it. Once the page is more or less stable, you should probably switch permissions so that anonymous users can't edit it. People are much less likely to vandalize if they are registered users.
Portfolio
Thanks, Pieter. I've already changed permissions to only allow registered users to edit the page. Hopefully the system can handle all the new users. :)
You've probably already noticed this, but the static page is now automatically updated every few minutes.
Portfolio
I use my site during classes, and having this fail-safe (or fail-over, or whatever) design is really important to me. My site disappeared on me during the class, and (I gotta tell you) I wasn't really happy. I'm very happy that you have responded in this way —- thoughtfully and productively. Thanks.
Scott Moore
University of Michigan, Ross School of Business
How Can I Find It? — My ETB blog — Research blog — Twitter — Gmail: scottamoore
Does the new front layer of static HTML somehow effect web stats?
The number of visits that awstats is reporting for my main site has skyrocketed since last Wednesday, whereas Google Analytics is showing the exact opposite.
Wayne Eddy
Melbourne, Australia
LGAM Knowledge Base
Contact via Google+
Wayne, there is no new front layer yet except for that snowleopard site. When we roll that out for all sites we'll clearly announce it. I've no idea why there would be a discrepancy between awstats and Google Analytics on your site, but they do measure slightly different things. Did you perhaps change domain names or add a custom domain?
Portfolio
Thanks for all your hard work on this! I had a few days' panic when I thought my new educational site wouldn't be available for my students when classes started, but you really came through: first, by reassuring us that steps were being taken, and second by the static html solution.
My site is now working as well as could be with both Internet Explorer and Firefox. In Google Chrome, however, attempting to load the page (miller09fall12pm.wikidot.com) causes the browser to get caught in a loop between "waiting for miller09fall12pm.wikidot.com" and "waiting for miller09fall12pm.wdfiles.com". It goes back and forth for minutes.
I know Chrome isn't quite the standard browser the other two are, but since there may be others with similar problems, I figured I would bring it to your attention - with my thanks.
Best,
Ben Miller
I assume that's a private site. There is a problem in Chrome (depending on the version). Go to a public site, login in there, and then open a new tab and go to your private site. If that does not work, let us know.
I'm not sure if Chrome has the "disable 3rd party cookies" option but that also causes this behaviour when set.
Portfolio
Pieter,
Thanks for your reply. Yes, it's a private site. A few days ago, I'd used that approach inadvertently, by way of checking the wikidot blog first - and it had been working. For whatever reason, though, it doesn't seem to be working today. I'm running Chrome 2.0.172.43 (the latest update) on Windows XP Home Service Pack 2, and allowing all cookies.
Again, this isn't urgent for me: I have both FF and IE and can get in just fine. But if the Chrome problem is a symptom masking some future problem, I figured it couldn't hurt to mention it where developers (my own programming skills are rather lacking) could see it.
There's a new site now (also on Wikidot!) for Mac OS X Lion …. see it here: http://roaringapps.com/
~ Leiger - Wikidot Community Admin - Volunteer
Wikidot: Official Documentation | Wikidot Discord server | NEW: Wikiroo, backup tool (in development)
Thanks for the plug :)
BMC Creative | RoaringApps | @brycecammo