Skip to main content



Online Bipartite Matching and Auctions at Google

Google is one of the biggest and most well-known technology companies in the world. Most of us use their search engine, or email service on a daily basis. In fact, as most of you may know, Cornell’s CMail account is simply an extension of Google’s Gmail service.

Whenever you log into your email account, or type in a search query on www.google.com, at the top of the webpage you will encounter advertisements from Google. These advertisements are how Google makes most of its revenue, and the algorithms that determine which advertisement to show to users work based upon bipartite matching, and auctions.

In lecture, we saw how Google acts as an intermediary in the markets of advertisements. Google helps products reach consumers through ads on Google’s websites and collect a small fee in doing so. We also covered the basics of bipartite matching under the context of buyers who want to purchase a house in a market. Now, let’s look at how these two concepts are connected by looking at how they are used to solve the same problem at Google by extending the theory of bipartite matching can to the realm of advertisements.

In order to do this, let us formalize the bipartite matching statement:

We are given a graph G=(V,E) where the vertex set V = L union R, is made up of two vertex-disjoint sets.

The edge set E=(u,v) where u in L and v in R. Therefore, we have the property that no two vertices in L, or in R, are connected to each other by an edge e in E.

A matching M is a subset of E such that every vertex in L and R appears in at most one of the edges in M. A matching is considered a perfect matching if every vertex in and R appears in exactly one of the edges in M. The goal is to find the optimal matching M* such that, for all M we have |M| \leq |M*|.

In this description of the bipartite matching problem, and in the description we saw in lecture, we know the entire vertex set V and all of the edges in the graph G before the start of the algorithm. However, another, more complicated version of this problem is called the “online bipartite matching” problem. In the online version of the bipartite matching problem, we are only given one of the two sides of the vertex sets, L, during the beginning. Then, over time, the other side of the vertex set, R, comes “online,” i.e. becomes visible to us along with the edges that connect vertices from R to L. More formally, we have that at a given timestamp delta t a new vertex j in R arrives, and announces its neighbors in L, i.e. the edges (u,j) in E such that u in L. Now without knowing what other vertices and edges may appear in the future, we have to match j to a vertex in i.

In order to gain a clearer understanding of the online bipartite matching problem, let us think about it in the context that Google faces every day.

Companies want to reach out to customers who use Google’s search engine service. They pay Google to place advertisements as a result of user search queries. At the start of any given day, Google has a large database of companies who want to present their advertisements to users that type in particular keywords in Google’s search engine. For simplicity, let us assume that each advertisement is only going to be presented once during a day. At the start of the day, Google has a large set L of all the advertisements that it is trying to show to users. The amount of revenue Google collects is directly related to the number of ads it displays throughout the day. However, Google is constrained by the list of keywords that a particular company has provided for its advertisement. Over time, users enter a search query, essentially coming “online” on the right side of the bipartite graph. During this time, Google has no information about what users will search for later on during the day, but has to determine what ads to return as part of the user search query.

This is the very basic setup of the “online bipartite matching” problem in the context of generating advertisement revenue. The following article ( http://www.stanford.edu/~saberi/sara.pdf) from SIAM News describes how Google uses a variant of the weighted fractional online bipartite matching algorithm to solve the advertisement problem.

As the article states, it is in Google’s best interest to devise an algorithm which displays ads such that the advertisers spend the highest fraction of their daily budget because the more money they spend, the more money Google makes. Two researchers from Georgia Tech, Mehta and Saberi, looked at the advertisement problem faced by Google and tried to come up with a solution. The article also describes how randomization can be used to improve the competitive ratio, a ratio comparing the solution outputted by an algorithm and the optimal solution to the problem, of an algorithm. For randomized algorithms and approximation algorithms, the performance of the algorithm is evaluated by its competitive ratio. Since we don’t know what a particular execution of a randomized algorithm will result in for a given input, we look at what the randomized algorithm will output in terms of expected value and try to reason about its performance in expectation.

For the online bipartite matching problem, the researchers also found it insightful to describe the problem as a linear program and use primal/dual analysis in order gain insight into the problem and extract more information about how to solve the problem. In the end, the researchers were able to come up with an optimal trade-off function which could be used by Google to identify which of the potential candidate ads to display to a user who enters a search query.

The concluding portion of the article describes how the current research being done in the online bipartite matching realm and its applications to advertisements can be extended to use statistical data or game theory. The problem as described above looks at a new instance of the online bipartite matching problem for each day and tries to assign queries to ads without having any historical information about the sequence in which queries are entered throughout the day. But it is easy to see that this is not really the case, over the past several years, Google has built up a data store of all of the queries which were entered on their websites throughout the day and this information could potentially be used to predict what new queries will arrive during the day. It is possible to imagine that during lunch and dinner hours Google might encounter a higher number of queries related to restaurants as people search for locations where they want to eat. Furthermore, one can imagine that around the holiday season people are more likely to search for specific keywords related to gifts and shopping. All of this information can be used to improve how Google matches queries to advertisements in order to increase revenue.

Furthermore, as we have seen in class, game theory also plays a big role in auctions. The amount of money Google can collect on a given day from an advertiser is bounded above by that advertiser’s maximum daily limit. Therefore, it is in Google’s best interest to have advertisers reveal their true valuations for keywords, such as in a second-price closed-bid auction setting, instead of having the advertisers try to manipulate the system by underbidding on specific keywords. Future research in the realm of online bipartite matching algorithms can take into consideration these ideas from Game Theory and researchers can design an algorithm which encourages the advertisers to reveal their true valuations for a given keyword.

Clearly the ideas that Google uses for its online advertisements are very complex and relate to a lot of areas where research is still ongoing, however, they have very strong ties to the basics of bipartite matching and online auctions that we have covered from the concepts of bipartite matching to the idea of whether or not to bid one’s true value in auctions.

For those who are interested, the research paper described in the article above can be found here: http://www.cs.berkeley.edu/~vazirani/pubs/adwords.pdf

Comments

Leave a Reply

Blogging Calendar

September 2012
M T W T F S S
 12
3456789
10111213141516
17181920212223
24252627282930

Archives