Google Blogoscoped

Sunday, November 20, 2005

Forty Faces Back-end

Here’s a quick rundown if you’re interested how is structured in the back-end. The site, as you may know, displays the faces of bloggers who just posted something new (and every face is linked to the respective post).

With this site, I didn’t want to risk any speed problems at all upon high traffic. Usually, those arise when there are many hits per second, and for every hit the database is queried. Although I changed some of my server configurations and didn’t have a problem since then, I still wanted to be very sure the site runs stable. So the main page is actually static HTML, which is re-created every 30 minutes by a non-public PHP5 script triggered by the Apache server scheduling job called cronjob. So no matter how many people hit refresh on, I won’t run into any problems, and the page will always be extra fast.

What happens when the cronjob starts the PHP5, which re-creates the page? There are two main modules which are separated; update data, and create HTML. Both rely on a MySQL table containing all the bloggers. Some of the first few bloggers, I searched for on my own (e.g. their photos were licensed using Creative Commons), and the rest of the bloggers “registered” by sending me an email with their photo attached to it.

Here are the fields the Forty Faces “blogger” table contains:

When I check a blog feed for new items (I’m using the free MagpieRSS PHP library, caching off), I don’t want to rely on the time-stamp provided in the RSS. It could be misconfigured and therefore provide me with the wrong time. I take an even simpler and more fail-safe route. There’s a “posts” table containing a list of all permalinks I polled so far. Whenever a new permalink is found, it must have been created within the last half hour, so it can be considered recent.

By the way, in the beginning there were often too many posts from one blogger appearing in a row. To make sure every face gets a chance to be shown, I introduced some filters. First, only the 3 most recent posts of every blogger will be shown. So if you blog 5 times within 30 minutes, you’ll push your own entries out of the front-page. Second, only 7 posts per day will be shown from any given blogger. That means if you post 5 times every 30 minutes, only your first couple of posts will be shown. Additionally, I randomize the permalink order so even when there are 3 new posts by a blogger which make it through fine, they mostly do not appear in a row to give Forty Faces a more balanced look.

Here’s what the “posts” table looks like:

Finally, there’s a “status” table in the MySQL database. It just logs whatever happens and allows me to check the program’s status even when I won’t see it execute. The fields here are:

And that’s all. The actual program is only 10K and was written fairly quickly, but then refined over the days as more “data” to work on came along. At the moment, the site can still accept new bloggers without refinements. Some people say the current concept won’t upscale, and they’re right, so if the amount of participating bloggers reaches a limit of 2,000 or so people and no one can be certain his or her face will show upon posting, I may introduce refinements which create a balance again (sub-categories may be one way to go, but I’ll prototype that when I have the actual data to work – say, an average of 150 new posts every half hour).


Blog  |  Forum     more >> Archive | Feed | Google's blogs | About


This site unofficially covers Google™ and more with some rights reserved. Join our forum!