Last night and this morning some of you commented on this blog that your site searches were not working. We would have responded immediately - sleep is a thing of the past for the Wikidot team - but notifications also mysteriously stopped working.
This morning we found the cause: one random disk, hidden deep in the Wikidot cluster, filled up with log files and from then on, a few key processes stopped working: the search indexes got corrupted, and notifications got lost.
In the old days, when we actually bought servers and installed them physically in expensive air-conditioned rooms, and when a hundred gigabytes was considered enough to store the whole Internet, we monitored our disks with eagle eyes. These days, with virtual servers and disks that we can install with a few mouse clicks, it's tempting to ignore the fact that even huge virtual disks will, eventually, fill up.
We're going to install (and we should have done this systematically on all disks) monitoring software that will tell us when any disk, no matter how innocent it looks, approaches even 60% usage.