Challenges in Building Large-Scale Information Retrieval Systems. Review by Yuriy Guts.
On July 13 we’ve organized a meetup with Rustem Arzymbetov, Software Engineer at Google. Saying that event was successful equal to saying nothing, after the announcement we reached a limit of 120 registrations in 4 hours! Check the photos, presentation and video recording in case you missed it.
One of the attendees, Yuriy Guts was so excited about the presentation and decided to write a small review.
Just a while ago I was working on a project that involved retrieving and indexing vast amounts of information from various Internet resources. Despite how simple the requirements seemed at first, later it turned out to be a source of cool challenges for the whole team. The tasks we faced resembled building a web search engine so closely, that, at some point, we started to call our system “mini-Google.” Time after time, being challenged with a new difficulty, I found myself thinking, “How would Google solve it? Did they have the same issues as I do now?”
Later on, I found out that the upcoming local GDG meetup will have a special guest, Rustem Arzymbetov, a software engineer from Google. This was a fantastic coincidence: just when I had so many questions about building a large-scale search engine, a Googler from the Search team was about to give a speech right at my company :)
Needless to say, the event attracted a wide audience, including new people I’ve never seen at GDG before. After a short introduction, Rustem described the purposes of building global information retrieval systems, and what problems they solve for the end users. Then we delved into the history of Google Search and important technical decisions that the engineers made during its evolution. We touched dozens of different aspects, including crawling, indexing, compression, sharding and replication, load balancing and fault tolerance.
Even though the most senior attendees seemed to be familiar with Google’s scientific publications, the talk had tons of surprisingly tasty details. It’s always fun to hear about the challenges of dealing with hardware-level RAM corruption, or hardcore tweaking of filesystem cluster layout to avoid extra disk seeks on spinning drives, or designing a custom compression to save a few bits here and there. Everything just to get that cool low latency we’re experiencing with every search query nowadays.
What especially struck a chord with me was that Google had exactly the same challenges in earlier generations that the ones I had on my project. That made quite an afterparty talk. Combined with interesting discussions about search personalization and Google Translate challenges for rare languages, there were far too many topics to cover in such a short timespan. I do hope that similar meetups will occur in the future, since listening about famous engineering solutions firsthand has always been an enlightening experience for many developers.
Yuriy Guts (Solutions Architect and R&D Engineer at ELEKS)
Being a polyglot programmer, Yuriy is keen on exploring emerging technology and discovering hidden opportunities. He is an active member of Ukrainian IT community, being a frequent blogger at ELEKS R&D Blog, contributor to Lviv .NET User Group and teacher at Lviv Code