As most of you know, one of cool Wikidot features is traffic analysis. For each site (actually only paid sites till now) we collect available traffic information and every few hours we analyze it using popular analytics software AWStats. The result — number of page views, visits, traffic sources, browser info, crawler stats and tons of other metrics — is then available from within ManageSite panel.
For some of you (especially managing high-traffic sites) it became clear that something had been wrong with stats for the last few weeks. And by "something" I mean large delays in analyzing logs, missing traffic and temporarily unavailable result pages. We are really sorry about these problems and we have been working hard (often till late night hours) to fix them.
The reason for these problems was the faulty (non-scaling) design of our log storage and log processing architecture. Two bottlenecks have been causing a lot of problems recently: central database used to store log files from several web servers and shared filesystem for analyzed data. These bottlenecks became even more apparent after moving servers to AWS.
Yesterday we moved to a completely redesigned (shared-nothing) architecture. We created it from scratch. All data and logs are now stored in a scalable storage (Amazon S3) and are being processed by independent workers.
A few days ago we needed over 6 hours to calculate web traffic for all eligible sites. Now it takes 30 minutes. 12-fold improvement. And we can make it even faster by increasing the number of workers. The instant benefit is obvious: more up-to-date and reliable results. Finally!
Last but not least: as the post title says, we are now adding web analysis to all free educational sites. We believe it is a nice addition to this increasingly popular plan!
My stats have certainly been reporting properly the last few days and I just noticed this morning the faster updating. Which makes a huge difference. You guys are working your butts off and although I know I am one of those 'squeaky wheels' who speaks my mind, it is not unnoticed. Thanks a lot for the improvements. The page loading is for sure much faster; I've been loving this as well. As a matter of fact, on my old site, one problem I consistently had was that the start page would never seem to finish loading, even though everything appeared on the page. Now that problem has gone away and it loads quickly.
I think it is great you're giving away stats to educational sites.
The only thing I wish for on the stats, is a feature that gives you the page that search queries led to. This is something that has always been useful to me from Statcounter, but which the AW stats doesn't provide. So many of my pages can have overlapping information, I can't always be sure to exactly what page a search might have led. I take it this is not possible to achieve with the AW stats?
BTW, could you guys answer the question that myself and James had about the new IP's? We wanted to know if there was any advantage, when having more than one domain, to use one of each IP. As he put it, is one more "main" than the other, or does one resolve to the other? (hope I got that right)…
BTW, Michal..I just asked the question about the stats out of curiosity. I don't mean to come of shouting for more, more, more. As it stands the improvement in reporting time is a great one.
Eric, it is fine. Actually I looked into AWStats and how to extend it to report the destination pages you mention, but found nothing. From time to time we look into AWStats alternatives but we cannot find anything that impresses us.
After we deal with a few technical issues we would like to improve a few things at Wkidot (or implement new features) that would benefit our users immediately. I wish we could move things forward a bit faster though…
But wait. Here is the thing: do you think that it would be useful to be able to download logs for your sites? Then you could either run the through other log analyzers or just find the information you need yourself! Looks like an easy thing to implement. What do you think?
Michał Frąckowiak @ Wikidot Inc.
Visit my blog at michalf.me
Could we have API integration, as well…? ;-) It would just have to download the same file you're planning to offer anyway, no need to provide anything more than that.
I'd love to do some statistical stuff with STE. Show a bunch of graphs for a selected period of time, etc.
But either way, it sounds great. Without API integration, I could still provide a way to import those stats and experiment with ways to display the data. API = easier, and simpler.
~ Leiger - Wikidot Community Admin - Volunteer
Wikidot: Official Documentation | Wikidot Discord server | NEW: Wikiroo, backup tool (in development)
OK, sounds good. We will definitely talk about it this week!
Michał Frąckowiak @ Wikidot Inc.
Visit my blog at michalf.me
Great! :)
~ Leiger - Wikidot Community Admin - Volunteer
Wikidot: Official Documentation | Wikidot Discord server | NEW: Wikiroo, backup tool (in development)
That sounds awesome!
Hi Michal,
I was wondering if you have an idea of what format these log files may be in. For example, would it look something like this?
(Example taken from http://httpd.apache.org/docs/1.3/logs.html#common)
A sample log file would be nice, if you have one available. That way, we have data to work with, and could have something ready to go the same day that you implement the feature (if your talks this week end up deciding that it is a good idea).
~ Leiger - Wikidot Community Admin - Volunteer
Wikidot: Official Documentation | Wikidot Discord server | NEW: Wikiroo, backup tool (in development)
I had one more question about the statistics. Is there something I need to do to set up my local time, so that the statistics are in line with my time? As it is, my statistics are rolling over to the next day at a very early time for me, around 6pm my local time, which would seem to correspond to your time at wikidot. Any way around this?
Eric, unfortunately there is no way around it. The time in stats is UTC (GMT), which is our "server time". With AWStats I can see no easy and reliable way of altering time zone on the statistics.
Michał Frąckowiak @ Wikidot Inc.
Visit my blog at michalf.me
That's okay!