Skip to main content



Linkedin’s People You May Know Feature: The Potential of Data Science in the Context of Network Discovery

If you have ever heard of Linkedin, you probably know it as a ”professional networking service for you to connect with friends and colleagues in the interest of career/job opportunities” or some variant of that phrase. But what many people do not completely understand about Linkedin’s true value is its ability to make accurate predictions of potential connections that fall within your network. The name of this feature is ”People You May Know”(PYMK) which was invented and launched by Linkedin back in 2006 as a service on their website. At the core, it seeks to solve a very simply stated binary classification problem: Will these two members connect with each other? Developing an effective method to tackle this problem, on the other hand, is not as simple of a process.

The first big challenge is defining what constitutes the likelihood of two individuals connecting on the service. Many factors may be considered, but a first guess at a method that one may make is ”triangle closing.” The idea is straightforward: Alice knows Bob, Bob knows Carol, hence there is some likelihood that Alice knows Carol. These ”triangles” can be scored further by observing overlaps between Alice and Carol’s background such as organizations, age, location, etc. These parameters help define affinity between two nodes within a network, and contribute to link prediction (ie. classifying probability of establishing connection). To model the network further requires deeper insights into understanding how networks of people work. One example of such an approach is further breaking down organizational overlaps and scoring by lengths of overlaps, geographic clustering, and general propensity. Linkedin has published a great paper that discusses their particular implementation as well as their mathematical basis on its theoretical effectiveness (link below). To summarize the conclusions of the paper, accounting for longer time overlaps and proximity to geographical clusters in cases of organizational overlaps greatly contribute to high success rates in establishing connections, particularly in earlier stages of membership.

Interestingly, the PYMK feature of Linkedin initially came about as just an experiment set forth by Jonathan Goldman (former analytics scientist at Linkedin in 2006) where he made small ads on the site which suggested top recommendations for connections. The feature had an immediate impact on the service’s activity, averaging 30% higher click-through rates than any other prompts to other pages in the site and generating millions of new page views. Contributions by data scientists like Goldman has inspired a new wave of network modeling: data analytics, and is becoming a trend in tech companies which deal with any kind of networks with large scale in mind. Facebook for example shortly followed Linkedin with their own ”People You May Know” feature which they began rolling out to their service in 2008. Yelp, the local business review site, uses data analytics to match the vast number of businesses with potential customers and reviewers who have accounts on their service. In an ever connected world, data science can be the new field that may unlock valuable knowledge in the sea of data within networks.

References

http://data.linkedin.com/projects/pymk

http://www.cs.utexas.edu/~cjhsieh/fp086-hsieh.pdf

https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/

Comments

Leave a Reply

You must be logged in to post a comment.