Skip to main content

using nofollow links to identify unnatural linking patterns



Nofollow links are links with rel=”nofollow” added to the HTML tag. This tells search engines to ignore the link, and so they aren’t considered in search engine optimization. For years, the Google search engine has simply ignored links with the nofollow attribute. Last month Google announced that they would begin viewing nofollow links as “hints” to determine if the link should be ranked. This change is officially starting in March 2020.

One reason this change was introduced was to help the pagerank algorithm. “Looking at all the links we encounter can also help us better understand unnatural linking patterns” Google said in its announcement, “By shifting to a hint model, we no longer lose this important information, while still allowing site owners to indicate that some links shouldn’t be given the weight of a first-party endorsement”. The unnatural linking patterns mentioned in the announcement refer to artificial links intended to manipulate a page’s ranking – by using nofollow links as hints instead of simply ignoring them, Google may be able to better identify and react to unnatural linking patterns which disrupt their SEO.

Comparing how reddit finds the “best” comments with how Google ranks pages

Reddit is an anonymous social media website that ranks posts and comments in aggregate. Unlike how Google crawls websites for links and then finds PageRank values to choose what websites show first, reddit relies on user up-votes and down-votes to choose what links get shown first. While reddit gets direct feedback about which comments should be listed first, there are many flaws with simply using the most up votes. Early comments tend to stay on top because they are in a vicious cycle of getting more and more votes. This is a problem on many websites. For example, Amazon sorting its products by reviews. Should a product with one 5-star review come before a product with 200 review and a 4.5-star average? I’d argue the product with more reviews should be rated higher and this is how reddit implements their system.

To prevent a heavy bias towards early comments (the ones that get into a vicious cycle of upvotes) reddit uses the 95th percentile lower bound of a Wilson confidence interface as the method to determine which comments are first to be shown. Coming back to the Amazon example, the first product with only one perfect review would be below the second product with 200 good reviews. In this system the confidence interval of the first product is extremely large because the sample size is small, and the second product’s confidence interval is small because it has a large sample size. Therefore, by using the lower bound of the confidence interval the products are ordered in a sensible manner.

I found this example an interesting comparison to how search engines order results. While reddit uses a different system to satisfy their different needs, they both try to find the “best” links. Many other websites could benefit from implementing a similar system.

reddit’s new comment sorting system

Limitations of Web Crawlers and how they affect Search Engine Performance

During Lecture and in the textbook, one of the topics that came up was how do search engines rank the search results of a particular query. The methods that were discussed — Hubs and Authorities and PageRank — both require at least a partial understanding of the overall structure of the internet. I was really interested in how a search engine obtains the graph or network structure of the internet, so I did a bit of digging to see what would come up. And while most search engines are rather tight-lipped about the specifics, Google had a general high-level overview of how they would find webpages in the internet.

The way Google indexes websites on the internet is actually a lot more straightforward than one would imagine — it uses something called a “web crawler”, which visits webpages and finds all links that exist within the website either through a sitemap provided, or just following the links in general. This makes sense, because the internet is just a graph with hyperlinks as the edges between the page nodes. The information about links is also essential when applying search ranking algorithms (Hubs and Authorities and PageRank), which uses the amount of links between sites to determine how relevant they are to the query.

Of course, such a method to determine the network structure can never be exhaustive — there has to be pages that slip through the cracks. I realized that because of the bowtie structure of web, there has to be some nodes that can never be visited, depending on where the web crawler begins it search. The disconnected components of the internet with no connections to the main component will definitely not be searched, but there has to be some nodes within the “IN” set that will not be visited by the web crawler, because there are no links from the main Strongly Connected Component of the internet to the site. This applies to the tendrils that stem from nodes in the “IN” set as well.

Knowing this, how will this affect the performance of the search engine? Based on what I have learnt about the search ranking algorithms, I believe that it would not have a large impact on it. The reason is because the number of links to and from any node in the “IN” set has to be limited compared to a node in the main strongly connected component. If Google uses Hubs and Authorities, then the node in “IN” set will not have a lot of authorities pointing to it, because the authorities will most likely be in the main strongly connected component, and thus it wouldn’t have been ranked highly anyways even if it was included in the search. Similarly, for PageRank, nodes in the main strongly connected component must have more links in and out the page compared to any node in the “IN” set, and again, will not be ranked highly anyways. Therefore, the web crawler’s inability to find nodes in the “IN” set will not affect the search results drastically, and the search engine can still retain a reasonable performance.

The Impact of Google Searches

This article gives an overview of Google’s algorithm that determines search results in light of Trump’s claims that the search engine is “rigged” against him. Google claims it does not bias their results based on political ideology, and rather bases results on an algorithm that shows users the most relevant and authoritative search results based on factors that include PageRank, location, and previous computer usage. The Washington Post article describes how companies and media outlets can try to “game” the system, or algorithm, to show their results. One such method is to use key or buzz words to compete for visibility in Google’s search results. However, if Google can bias search results toward newer or fresher content, there is also the potential to censor content. Eric Schidt, the chairman of Alphabet, publicly considered demoting content that is “hateful or extreme” and making it more difficult to find.


This article relates to our class discussion on search engine results and PageRank. PageRank is an algorithm that measures how important or authoritative a website is, based on the number of in-links the page has. In particular, the article discusses how PageRank is an important factor in Google’s search algorithm that assign authority to sources, but how search results are also based on other factors, such as geographic location and previous search history. This goes beyond the ranking algorithm discussed in lecture, and points to the sophistication of search algorithms today. Although the algorithm is extremely useful in determining the relevance and authoritativeness of web pages, there is still the chance that the search engine highly ranks information that users may not believe to be the most relevant, particularly if the user has a certain political ideology, based on PageRank and other factors used by the algorithm. This also poses the issue that the top results could potentially be extreme or not relevant based on the algorithm’s ranking, which demonstrates that search algorithms are not perfectly personalized in their function despite being immensely useful and increasingly accurate.

Bidding for the Acquisition of Barney’s

This article is about the bidding that is currently happening for the department store Barney’s, which filed for bankruptcy in August after its New York City landlord tried to double the rent. Now that Barney’s has filed for bankruptcy, other companies are submitting bids in hopes of acquiring the well-known clothing department store. Barney is looking for buyers in order to prevent liquidation of the store, and there are currently two major competitors seeking to acquire the store. One is Authentic Brands Group LLC, a company which owns a number of major brands including Nine West. The other high-profile potential buyer is a group of investors led by Sam Ben-Avraham. While the group of investors does not own any major brands, many of the individual investors have experience in retail and envision a new path for the department store.

The bidding for Barney’s department store is similar to the ascending bid, or English auctions that we studied in class. Though the situations are not exactly identical, the concepts of an English auction can still be applied to what is currently happening with Barney’s. It can be compared most accurately to an English auction because bidders are able to see the values of other bidders. They can stay in the auction and continue to make bids until they reach their true value, at which point the bidder will drop out. Because the dominant strategy for a bidder is to not bid past their true value, the winning bidder will be able to bid just above the second highest bid, thus still making a profit from their bid. Because the auction is public, bidders should not have their initial bidding price equate to their true value. Though it is unknown what the true values of Authentic Brands Group LLC and Ben-Avraham’s group are, we can assume that they are following this strategy and bidding their initial bids below their true value. This assumption is based on each bid as a whole. The bids include dollar amounts as well as explanations of what each bidder will transform Barney’s into. Authentic Brands Group LLC wants to license the department store to Saks, while Ban-Avraham’s group wants to turn Barney’s into a single location retailer that features newer designers. Each bidder is able to bid less than their true monetary value by also providing their visions for what the retailer will become. Each bidders’ new plans are considered as part of the total bid so that while the monetary value might be lower than their actual price, they are still providing other values that are important to the acquisition of Barneys. After giving initial bids and seeing the prices of other bidders, Authentic Brands Group LLC and Ben-Avraham’s group are both able to submit new bids with higher monetary values, or leave their current bid as it stands and hope that the non-monetary value they provide outweighs the higher monetary bid of another group.

The Game of War – Modeling the Clash Between India and Pakistan as a Classic Game Theory


This article provides a brief overview of the turmoil between India and Pakistan that has been occurring for the past several years, and goes on to depict their current diplomatic situations with a game theoretic approach.  The author begins by describing the situation at hand and listing a few of the more recent altercations between the two countries. The article also provides some background as to what each of the nation’s current goals and relevant policies mean for the progression of the situation. A key element is India’s No-First-Use Policy which states that India will use its nuclear weapons in retaliation only, and not as a first offense. This puts India in a vulnerable position and gives Pakistan knowledge of its opponent’s strategies before any moves are even made.


The current situation between the two countries can be modeled as a “prisoner’s dilemma game” with each country having two identical options: launch an attack against the other player, or launch no attack. A Nash equilibrium exists when both players attack at the same time, because they will both do damage to the other player – which is each countries goal. However, a clear better outcome for each player is for neither of them to attack and thus not get damaged. The existence of India’s no-first-use policy, and the fact that Pakistan has knowledge of this policy, is what is fostering the choice of the latter option. Pakistan’s dominant strategy is to not attack, as long as India’s stays true to this policy. Therefore, country leaders and military officials are doing everything in their power to ensure that this policy stays in place for as long as possible, guaranteeing peace in the region.

Why aren’t homes sold in Second Price Auctions?

Second price, sealed-bid auctions (Vickrey auctions) happen to be the most common auctions there are. Sites like e-bay use this strategy in deciding the price at which goods are sold to on average optimize payoffs to both sellers and buyers. But are these auctions applied in areas where it seems perfectly reasonable to use them, say in real estate? In a blog post, chairman Emeritus of Electronic Frontier Fund Brad Templeton shares his thoughts on why second-price auctions are not used in the sale of homes, and I summarize his ideas below.

According to Brad, the driving force behind irrational behavior in home buyers is that a huge proportion of the public does not understand how second-price auctions work. Sellers, knowing this, can therefore take advantage of buyers by prompting them to bid higher than the value of the real estate in question, driving up the seller’s profitability. One of the main reasons is because home purchase for example happens rarely in individual’s lifetimes—perhaps once or twice and they thus strive to purchase the home at whatever cost without necessarily considering their true value of the asset; all they want is to win. As such, the public tends to misunderstand second-price auctions the mechanisms behind it. Particularly, they do not appreciate the importance of figuring out one’s true value of the asset and being happy with the result of the auction whether or not they win. However, this is a matter of contention as people’s perception of ‘loss’ for an asset sold at say $1.002 million when their bid was $1 million may be different from that of an e-bay item sold for $30 when their bid was $25.

The question though remains, would a second-price auction be optimal? As we discussed in class multiple times, yes—incumbent upon the auction being a second price, sealed-bid auction. The dominant strategy in a second price sealed bid auction is always bidding one’s true value of an item. In this case, if one’s bid is the highest, they win the asset and pay the second highest bid. On the other hand, if they do not win, then it is solely because their payoff would otherwise be at-least zero, in which case one should be indifferent. The outcome therefore is dependent on whether or not buyers know what their actual value of the asset is. Unfortunately, in the case of very expensive assets, this does not seem to be the case!



How Facebook Advertises to Your Hobbies and Interests


Everyone who uses Facebook has wondered at one point in their life how the company shows them ads so eerily similar to their interests. Are they getting these ads by coincidence? Or do they somehow know your hobbies, interests and behavior? It turns out that whenever you use and interact with anything on Facebook, you leave a trail of “digital bread crumbs” that the company can then collect and analyze. Using this data, Facebook can then target your interests, characteristics, and even your behavior, such as physical in store visits and activity on other websites to show you ads that they think will relate to you. Furthermore, Facebook also implements something called “look-alike matching” where if someone provides their email to an ad, Facebook will then show these ads to people who they believe have similar interests to the person with the provided email.


As we learned in class, search engines, such as Google, implement auctions in order to determine which advertiser gets what ad slot. Facebook similarly auctions off their ad slots but also weighs each bid according to the relevance to each situation, meaning that the more relevant the ad is, the less the advertiser will have to pay. Furthermore, instead of comparing the relevancy of a certain ad with other ads, they compare the relevancy of an ad to all of the content on their platform allowing advertisements to blend in with organic content. Normally these ads are marked with a sponsored tag but if a user were to like the ad, then content from the advertiser would show up in their feed without the sponsored tag acting as free advertisements. As these ads are shared more and more, Facebook’s algorithm will see them as more relevant causing them to cheaper to post making it possible that in the future, ads will become so frequent that we may not be able to tell the difference between organic content and advertisements.

Modeling atomic interactions with graphs

Throughout the course we have discussed graphs in great detail, particularly in their connections to social networks. Graph theory has also found its applications in modeling macroscopic behavior. This article discusses research currently being done at Florida State University using graph theory to learn more about the composition of different materials. There are electrons and ions within each atom that impact how they interact with other atoms. These researchers modeled the interactions between atoms using a graph by letting the atoms be nodes, and the forces be directed edges with certain weights. When they modeled the atoms in a graph, they implemented spectral sparsification, an algorithm that reduces the edges in the graph while keeping the same number of nodes. This algorithm helped to sparsify the graph while retaining important information, decreasing the computational complexity of analyzing the graph itself and allowing simulations to run much faster.

I thought that this research showed an interesting application of modeling with graph theory; in our course, we have certainly used graphs to model connections between people/things or between different groups, but using graphs to model the forces between atoms is quite different. According to the article, the research was mainly a proof of concept, but it would be intriguing to see where it goes — perhaps this interesting application of network theory could be taken further in the field, allowing us to gain more insight into what makes certain materials efficient or into how materials transport energy.


Monetary Compensation of YouTubers

YouTube was launched, in 2005, as a platform for anyone to share videos in the United States. Due to the wide expanse of the internet, the website quickly grew and was bought by Google, for about 1.65 billion dollars. At this time, the site only reached the United States and was relatively small, compared to its reach today. In 2010, new types of ads surfaced, which allowed those who previously posted on YouTube for fun to now earn a living. The means of receiving monetary compensation of YouTubers relates to our class due to our new topic of the Search Industry. When talking about how Google makes its money off of ads, it sparked an interest in how YouTubers make so much money. The article cited to show that while there are many restrictions on what YouTubers can share, there is a large market for advertisements to be shared with the public. The targeting of certain demographics to share specified ads was also touched on in class.

Those who began to work for YouTube as their full-time job are now referred to as YouTubers. YouTubers mainly earn a living through ads. There are also YouTube sponsored activities that help YouTubers increase their salaries. Explicit sponsorships and affiliate links are the most obvious to viewers. After receiving, or agreeing to, explicit sponsorships YouTubers can create videos where the product or brand being advertised is explained or shown to viewers. Based on the number of views that video receives, the YouTuber is paid per view based on the amount specified in their contract. Often in videos, Youtubers show a specific product and offer their viewers a small discount on the product. These discounts are called affiliated links. Brands often target YouTubers with audiences that fit the demographic of their product. Another way of increasing a YouTuber’s profit is free product sampling. Brands can also contact YouTubers to send them free samples of products in hopes that they will review the product in front of their audience with positive feedback and increase the exposure of the product.

Over the years, YouTube has increased its advertising regulations. One of these regulations includes that videos must go through screening before being posted publicly. YouTubers are required to state when videos are being sponsored by certain videos, and often Youtubers will state that items they are using are either sponsored or not sponsored. YouTube requires videos to exclude explicit content with the penalty of not following this being a reduced revenue or no monetary compensation for videos.

keep looking »

Blogging Calendar

October 2019
« Sep