Skip to main content



The PageRank Algorithm in Biomedical Literature Ranking

Medline is the database of the U.S. National Library of Medicine. Literature importance in the database is typically determined by the number of inbound links to the article, which indicates the article’s number of citations. However, since such an algorithm does not consider the difference in the quality of citations, a research paper by Elliot J. Yates and Louise C. Dixon proposes using the PageRank algorithm to determine the optimal ranking of literature importance in biomedical literature. 

The original algorithm used by Medline calculates literature importance by counting the number of citations that a piece of literature received, meaning that a journal with small distribution has equal importance compared to authentic review journals in contributing to a literature’s ranking. This creates problems in the rankings: as the quality of review journals is not accounted for, the ranking might not be the most optimal and accurate ranking. Nevertheless, determining the quality of review journals is a subjective task. To minimize subjectivity in this task, the PageRank algorithm can potentially add to the current algorithm to produce a more optimal ranking.

The PageRank algorithm works by determining the quality of a review article using the number of important pieces of literature that this review article is linked to. Then, the importance of these pieces of literature is determined by the number of important review articles that endorses them. Therefore, this creates a cycle, where the more important pieces of literature get a higher quality, which in turn contributes to increasing the importance of review articles that cite these pieces of literature. The research also touches upon the advantages and feasibility of using the PageRank algorithm to determine the optimal ranking of biomedical literature. For example, with the increasing adoption of on-demand cloud computing infrastructure, scalability in data extraction is feasible, allowing the algorithm to be trivially calculated on commodity cluster software. 

The technical discussion of the PageRank algorithm fits what we have discussed in the lectures. In INFO 2040, we talked about the mechanism of the PageRank algorithm. A node is either a hub (review journals) or an authority (biomedical literature) in the algorithm. We start with a hub score of 1, sum the total score of hubs that link to an authority, and then assign that score as the authoritative score of the authority. Then, to recalculate the hub score of each hub, we sum the total score of the authoritative scores that the hub links to. We repeat the above steps with a small k, (k times) to calculate a final score indicating the quality of each authority. 

As the paper was published in 2015, I did additional research on what other fields the PageRank algorithm can be used for. In an article written by Amrani Amine, he explains that the algorithm, originally designed for Google search ranking, is now adopted in other fields such as ranking users on social media. 

Overall, the steps of the PageRank algorithm provided in class conform with the explanation provided in this algorithm. I think it is interesting because this paper explains the PageRank algorithm in a larger context, allowing us to understand how the algorithm can be applied to different fields of expertise. It is also interesting to see connections between lecture materials with real-life examples, which further helps with the understanding of the materials. 

 

Source of the articles:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4674919/

https://towardsdatascience.com/pagerank-algorithm-fully-explained-dc794184b4af

 

Comments

Leave a Reply

Blogging Calendar

October 2022
M T W T F S S
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Archives