Beyond Google, Yahoo, and Bing: Indexing the Dark Web
Ninety-six percent of the Internet cannot be found through searching on Google (or other search engines) and is inaccessible to the average user. These sites which make up the so called Dark or Deep Web are the undetectable crevices of the Internet include pages that are unlinked, private, or have limited access. The common thread between all these sites is that they are unindexed by search engines; however, these sites can be accessed if the address of the site is known. This makes navigating the dark web hard. Instead of searching, users have to access well-known resource sites (hubs) which list popular and commonly accessed sites (authorities) to find out about different sites to visit. From there they may be able to find lesser known sites by clicking through sites, and some sites will never be found unless the address is known. Thus, the dark web can be pictured as a network with a few large connected components, a lot of smaller connected components, and many single, unconnected nodes.
Because of the difficulty encountered in navigating the dark web, it is a perfect medium through which vice and illegal crime is conducted. The dark web is oftentimes associated with human trafficking, drug trading, illegal firearm distribution, hitmen services, and other illegal networks.
The Defense Advanced Research Projects Agency (DARPA), a team within the Pentagon, has been working on a search service to do what Google can’t do: Index the dark web and use this information to crack down on illegal web activity. The service, called Memex differs from commercialized search engines like Google, Yahoo, and Bing whose results are based on advertising and ranked page orders as determined by their respective search algorithms. These search engines, while they do not lack the necessary means to index the dark web, do not have great incentive to do so as much the information is irrelevant to the common user, and may be costly to index.
Memex will instead display information in the form of infographics, displaying diagrams of how specific search terms are linked between nodes (pages) on the Internet. Tech Times describes one application of how this could be used: “For instance, searching [on Google] for a name and phone number that crop up in a sex trafficking ad would not bring up a list of other places on the web where the name and number show up. Instead, Memex would create a diagram containing dots representing the web pages containing the name and number, thus drawing a bigger picture of what could possibly be a human trafficking ring operating online.”
While Memex’s achievements in web indexing are impressive, it will not probably be upsetting major search engines like Google anytime soon. For the common web user, Memex will not yield relevant search results but it will have major implications in crime detection and other types of large scale data analyses.
http://www.techtimes.com/articles/32601/20150214/meet-memex-darpas-dark-web-search-engine-what-can-it-do-that-google-cannot.htm
https://www.wired.com/2015/02/darpa-memex-dark-web/
http://truththeory.com/2012/06/06/what-is-the-deep-web-a-first-trip-into-the-abyss/