One of the interesting challenges we faced in this edition was the fact that not all our Hindi news sources are in UTF-8 format. Though we strongly back and urge the adoption of the Unicode-based UTF-8 standard by all Indian language websites, we didn’t want to deprive our readers from reaching content on some of their favourite news sources which are not yet there. So we internally translate this information to the UTF-8 standard and do all the processing necessary to provide links to these sites.
After one of our programmers pulled an all-nighter with lots of screaming and hair pulling, we managed to parse the totally non-standard character garbage which sloppy webmasters decided is good enough to share with the world. The programmer in question is now in extended vacation and we’re unsure if she ever returns.
[Thanks Manoj Nahar!]
>> More posts