Searching the Distributed IPFS Web
Source: https://github.com/ipfs/archives/issues/8
Most of the web today uses the Hyper Text Transfer Protocol (HTTP) to send the webpage from a central server to your computer, and use a Domain Name System (DNS) to lookup the IP address for a given domain name. For example, consider trying to visit google.com. When you type the URL into your browser, the first thing it does is resolve the URL to an IP address by asking a centralized DNS server. Then, it uses HTTP to transfer the content to your computer. However, as demonstrated by a recent Distributed Denial of Service (DDOS) attack on Friday, October 21, such a centralized system very vulnerable. The future of the web is a protocol for a distributed file system, called the InterPlanetary File System (IPFS).
Right now,web searching is also centralized: Google has huge databases, which index the web. However, with a distributed web, the searching too should also be centralized, so that you can find anything you want without relying on a single company. This is a far more interesting problem then a centralized search like we discussed in class. With a centralized search, a company, like google, can build up a huge database, and then you can ask them to search their database for you. However, in a distributed search, there is no central database. How could a distributed search work? The designers of IPFS suggest two ways:
- Static: nodes in the IPFS network maintain indexes of the file currently on them, and serve these indices just like any webpage.
- Dynamic: when a client wants to search for a page, his search is sent from node to node and each node returns the pages that match his query.
Missing from both of these is how the client decides which node to contact. If a client is searching for cows, and asks some other nodes for pages on cows, it might happen that none of those nodes have pages on cows because all of those pages are stored on other nodes. Thus, in order for the search engine to work, you would need some sort of hub which could tell you what nodes have pages about cows, and perhaps what other hubs it thinks would have pages about cows. In this way, you could have many such IPFS nodes throughout the network which collect and update this information. Then, using something similar to the hubs and authority model discussed in class, client nodes could rank these hubs, and decide which hub to trust when searching.