Skip to main content



The Deep Web (Darknets)

Large search engines like Google and Bing, despite the enormous amounts of content they seemingly allow access to, only barely scrape the outer layer of the entire web. Top researchers say that the Web as most people know it (Facebook, Wikipedia, Youtube – things that we can search for and find on Google or other search engines) may make up less than 1% of the entire web. Of course, this is difficult to calculate, but this order of magnitude is astonishing nevertheless.

Though this may be hard to fathom (doesn’t Google know everything?), the concept behind the Deep Web is simple. Search engines crawl through and index web pages via their incoming and outgoing links, but these only apply to static pages. Think about a page that is served dynamically: for example, when you ask an online database a question. Interestingly, the recent surge in new web technologies like Angular.JS has popularized Single Page Applications (SPAs) where new content is dynamically generated to the page; however this also makes it more difficult to implement SEO because of less static information on the page that web crawlers can understand. Google and other search engines don’t capture information from these dynamic pages, as well as pages behind private networks or standalone pages that connect to nothing at all. These are all part of the Deep Web.

Most of the Deep Web holds valuable information. A report in 2001 estimates that 54% of it is databases. Another 13% is hidden because it lies within an intranet; for example, an internal network exclusive to a large corporation or university. Finally, there’s the dark corner of the internet, sometimes called the Darknet, or TOR. It can only be accessed using the TOR browser, and leads to .onion sites that can range anywhere from illegal drug sales, assassination contracts, human trafficking, or simply discussion boards.

Since the most common model of the web as an entity is a graph, it’s interesting to note that, although we do say that a large majority of the web is connected, it’s actually not (if you only measure edges as incoming and outgoing links). There is a gigantic portion of the web that remainsĀ unconnected from the web that we use on a daily basis. Although it may not always be used for the most ethical purposes, it serves as important sources of information for big data, given the vast collections of databases contained within the deep web.

http://money.cnn.com/2014/03/10/technology/deep-web/

 

Comments

Leave a Reply

Blogging Calendar

October 2014
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
2728293031  

Archives