Data Structures & Algorithms I Actually Used Working at Tech Companies - The Pragmatic Engineer

https://blog.pragmaticengineer.com/data-structures-and-algorithms-i-actually-used-day-to-day/

Do you actually use algorithms and data structures on your day to day job? I've noticed a growing trend of people assuming algorithms are pointless questions that are asked by tech companies purely as an arbitrary measure. I hear more people complain about how all of this pointless, and a purely academic exercise. This notion was definitely popularized after Max Howell, the author of Homebrew, posted his Google interview experience:

Google: 90% of our engineers use the software you wrote (Homebrew), but you can’t invert a binary tree on a whiteboard so fuck off.

While I've also never needed to use binary tree inversion, but I have come across everyday use cases of data structures and algorithms when working at Skype/Microsoft, Skyscanner and Uber. This included writing code and making decisions based on these concepts. Even more, I used this knowledge to understand how and why some things were built and how I can use or modify them.

This article is a set of real-world examples where data structures like trees, graphs, and various algorithms were used in production. Almost all of these are first-hand experiences. I hope to illustrate that a generic data structures and algorithms knowledge is not "just for the interview" - but something that you'll find yourself reaching to both when working at, or when building up a fast-growing, innovative tech company.

I've used a very small subset of algorithms, but almost all data structures. It should be of no surprise that I am no fan of algorithm-heavy and non-practical interview questions with exotic data types like Red-Black trees or AVL trees. Never asked these, and never will. You can read about what I think about these interviews at the end of this article. Still, I find lots of value in being aware of what options for basic data types they can choose to tackle certain problems. With this, let's jump into examples.

Graphs and graph traversing: Skype and Uber

When we built Skype of Xbox One, we worked on a barebones Xbox OS, that was missing key libraries. We were building one of the first full-fledged applications on the platform. We needed a navigation solution that we could hook up both to touch gestures and to voice commands.

We built a generic navigation framework on top of WinJS. To do so, we needed to maintain a DOM-like graph to keep track of the actionable elements. To find these elements, we did DOM traversal - basically, a B-tree traversal - across the existing DOM. This is a classic case of BFS or DFS (breadth-first search or depth-first search).

At Uber, the team built many tools to visualize nodes, dependencies, and their connections. One example was a visualization tool for RIB nodes. The approach was the same in this case. The tool needed to maintain a tree, visualize this into an SVG, then update the tree, as the RIB tree on the mobile device changed. Also, RIBs themselves maintain a logical tree structure for state management that is different from the rendered objects: this is one of the key ideas behind their design.

RIBs tree visualization: a classic example of using trees to both represent data, and to visualize it. See the whole presentation

Weighed graphs and shortest paths: Skyscanner

Skyscanner finds the best deals on airline tickets. It does this by scanning all routes worldwide, then putting them together. While the nature of the problem is more on crawling, and less on caching - as airlines calculate the layover options - the multi-city planning option becomes the shortest path problem.

Multi-city was one of the features that took Skyscanner quite a bit of time to build - in all fairness, the difficulty was more on the product side, than anything. The best multi-city deals are calculated by using shortest path algorithms like Dijkstra or A*. Flight routes are represented as a directed graph, with each edge having a weight of the cost of the ticket. Calculating the cheapest price option between two cities was done via an implementation of a modified A* search algorithm per route. If you're interested in flights and shortest paths, the article on implementing the shortest flight search path using BFS by Sachin Malhotra is a good read.

With Skyscanner, the actual algorithm was far less important, though. Caching, crawling, and handling the varying website load were much more difficult things to crack. Still, a variation of the shortest paths problem comes up with many several travel companies that optimize for price based on combinations. Unsurprisingly, this topic was also a source of hallway discussions here.

Sorting: Skype (kind of)

Sorting is an algorithm family I rarely had an excuse to implement or needed to use in-depth. It's interesting to understand the different types of ways to sort, from bubble sort, insertion sort, merge sort, selection sort and - the most complex one - quicksort. Still, I found that there is rarely a reason I had to implement any of this, especially as I never had to write sort functions as part of a library.

At Skype, I got to exercise a bit on this knowledge, though. One of the other engineers decided to implement an insertion sort for listing contacts. In 2013, when Skype connected to the network, contacts would arrive in bursts, and it would take some time for all the contacts to arrive. So this engineer thought it's more performant to build the contact list organized by name, using insertion sort. We had a back-and-forth on this, over why not just use the default sort algorithm. In the end, it was more work to properly test the implementation, and to benchmark it. I personally didn't see much point in doing so: but we were in the stage of the project that we had the time.