Where Did 'Sign in' Go?

nav_first.pngFirst: blog:more-than-just-a-wiki
More than just a Wiki
Edited: 07 Jul 2009 07:57 by: pieterh
Comments: 2
Tags: who-watches-the-watchers

nav_prev.pngPrevious: blog:bandits-beware
Bandits Beware!
Edited: 21 Oct 2009 12:03 by: pieterh
Comments: 21
Tags: good-bad-and-ugly

Last: blog:new-wikidot-office
Wikidot moves its office to Business Link Toruń
Edited: 24 Mar 2014 15:37 by: michal frackowiak
Comments: 3
Tags:
nav_last.png

Next: blog:a-better-design-process
A Better Design Process
Edited: 22 Oct 2009 13:21 by: pieterh
Comments: 6
Tags: programming-by-contract
nav_next.png

by pieterh
on 22 Oct 2009 12:18

At 17:19 UTC on Wednesday we made a small change to the Sign in and Create Account actions to make the Wikidot software work better with SSL. Unfortunately, this also broke the two actions on most Wikidot.com sites. We just resolved the issue now, and my apologies for that. Full explanation follows.

The change itself was part of our parallel work on Wikidot.org, our project to produce a full open source version of Wikidot that both runs our dot-com service, and is available for free download.

We changed the way these two actions worked with SSL, but ended up with cache inconsistency. This is one of the horrid aspects of building a service like Wikidot.com: every single access must be cached (i.e. taken from fast memory instead of the much slower database).

In this case we had JavaScript code that was trying to calculate an address for the actions, but not getting the needed information from (old) cached template code. Result: the JavaScript did nothing and people could not sign in.

On sites with recent pages, no problem: as soon as the page is changed, the cache is reloaded and it worked fine. So many people trying to sign in at www.wikidot.com (not recently changed) were unable to, but on many other sites and pages, it worked fine.

We solved it, once we realized what was wrong, by clearing the cache. This is fairly major: it took www.wikidot.com something like a minute to refresh after that.

We have looked at our internal processes to ask the key questions, "how did this happen?" and "how can we prevent such mistakes in future?" and for this change, one of us moved code to the production cluster without going through the full test cycle. A harmless change, it seemed. Lesson learnt: even the most trivial change gets full review before going live.

My apologies again. Thanks to everyone who reported the problem.

Comments: 10

Add a New Comment

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License