Skip to main content



Google’s wicked fast datacenter network

We all know Google search is fast, but nobody ever really stops to appreciate how fast. In order to serve the millions of people that use search every single day, Google has built 12 massive datacenters filled with thousands of servers. The servers are interconnected via a network of physical cables and routers that are responsible for transferring data between these servers. Conceptually, Google’s datacenter network is an exteremely large graph with millions of nodes which represent physical servers. The edges in the graph represent physical network cables between these nodes. Each individual search request can traverse hundreds of edges as data is sent between servers to find the relevant data and aggregate it into the single page that we see. When there are millions of these requests happening at the same time, congestion along these edges becomes a major concern. In addition, the edges themselves may disappear and reappear at any time due to physical issues like a poor connection or an accidentally cut wire. To be able to rapidly respond to your request, Google must route around this congestion efficiently and make intelligent decisions to handle unexpected network problems — all of which happens in just a few milliseconds.

To solve this dynamic routing problem, Google developed Firepath, an internal routing algorithm that enables them to make near realtime routing decisions for each network packet that travels across their networks. In order to achieve this, Firepath distributes an in-memory graph and constantly updates this structure in realtime with measured network latencies, which represent higher edge costs, as well as link availability. All this heavy optimization allows them to handle 1 Petabit/s of bisection bandwith, which is fast enough to transfer the entire contents of the Library of Congress in 1/10th of a second. In graph theory terms, Firebase is responsible for finding the optimal path between two nodes in almost real-time. In a very simplified sense, this boils down to a shortest-cost problem where the edge costs may represent network latency, actual bandwith costs for using a specific link, or some other function. Being able to do this on extremely large dynamic graphs turns out to be a challenging problem and one of the main reasons the Firepath has not been open-sourced. By keeping it internal, Google’s maintains a competitive advantage as they can maximize utility of their current systems without having to invest more into their datacenters.

Source: http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43837.pdf

Comments

Leave a Reply

Blogging Calendar

September 2015
M T W T F S S
 123456
78910111213
14151617181920
21222324252627
282930  

Archives