Using PageRank to Detect Anomalies and Fraud in Healthcare
https://hortonworks.com/blog/using-pagerank-detect-anomalies-fraud-healthcare/
In the United States, it is reported that 3-10% of total health care spending is lost each year to healthcare fraud. This equates to about $70B to $260B and in Europe there is a similar problem with about $30B to $100B are lost each year. With these numbers being so high, it is important to find a way to limit the healthcare fraud, and this is where PageRank comes into play. When we think about PageRank in our class, we use the definition that PageRank is “a kind of ‘fluid’ that circulates through the network, passing from node to node across edge, and pooling at the nodes that are the most important. ” The algorithm was put in place at Google in order to find the probability that an user who is surfing the web will end up at a particular web page. In this way, the higher the PageRank score the more likely that the user will visit that particular page in the network.
In an effort to finding anomalies and healthcare fraud, this articles sites a variant of PageRank called Personalized-PageRank. In this algorithm, the PageRank scores are computed by specific “topic of interest” that are from a already selected group called source vertices. With this type of PageRank in place, the user can only start from a random page and then can only get to a node that is included in the set of nodes. This type of PageRank is used in three steps in order to find anomalies in healthcare transaction records.
First off, they compute the similarities between providers by a procedure code. It is expected, that any two providers who are in the same type of specialty will have similar codes. A undirected graph is generated where the providers are the nodes and the edges are if the two providers are “similar” to each other, based on the codes that were interpreted. Second, there is a system that loops through all the specialities and the “source vertices” are to include all medical providers with that specific speciality. Here is where the Personalized PageRank algorithm comes into effect where it goes through the undirected graph and then gives out a resulting score. The anomalies providers are ones that have a high score from the algorithm but it is not in their specific speciality. Third, after this is all in place the group interprets that data and is able to identify where some of the medical providers could be involved in healthcare fraud.
In this example, this is the Plastic Surgery specialty, there was a cluster of 1228 medical providers. After running the PageRank algorithm on this group, they found an Otolaryngologist that is anomalous, with the procedures billed (below.)
It is obvious that these are procedures are from plastic surgery, but in this case they were performed by an Otolaryngologist. This can be a case of healthcare fraud from this particular healthcare provider.

