Google & Open Source Repositories
https://www.wired.com/2015/09/google-2-billion-lines-codeand-one-place/
In discussing search engine’s in class, its important to consider what goes into them as well as how they work. The largest single repository of code most likely belongs to Google, the most popular and efficient search engine on the internet. In a recent conference, Google employee Rachel Botvin estimated that all of Google’s internet services amount to somewhere around 2 billion lines of code. The unique thing about this wealth of information, is that it is completely accessible to all of Google’s 25,000 engineers. This code bank is treated more as a library that just an incomprehensible collection of data. When building new projects or editing others, engineers simply sort through the code already there and utilize what they need to form something new. The majority of the code is laid out for you, but it is the small changes that can make all the difference.
However certain aspects of the code is not available to all Google engineers. In particular, the PageRank search algorithm is only open to a select number of employees. The science behind this algorithm and how Google has developed it to become so effective is an increasingly interesting and discussed topic. It is a fundamental factor in how the search engine has become successful and thus more secretive.
This article also discussed what is known as version control systems and what Google has and currently is experimenting with in that regard. A version control system effectively moderates the code repository. Because there are so many changes made daily to each aspect of Google’s services, this version control system, currently Piper, is always searching for human error and implements automated bots that deal with a large portion of the commits to the code. Piper is effectively the code doctor and keeps everything running healthily. Although Piper is private, Google is experimenting with a new open source version control system called Mercurial. Mercurial currently allows the seamless movement of code libraries yet does not function on a large scale, such would be necessary with Google. However this open source technology control system is in the near future.
Our class discusses the ideas of PageRank and the complexities of web networking and computer to computer interactions. This article discusses PageRank as well as highlights the increasing importance of the ability to network between code libraries. There are services such as GitHub, which provide an open source repository of code to the public. Imagine as GitHub grows and tech giants too begin to share their private code with the world. We are on the forefront of an amazing culture of worldwide collaboration, and the emphasis on networks is only getting bigger.