Skip to main content



Creating biological networks using natural language processing

Background

Machine learning is rapidly gaining popularity in the world of medicine. The idea of personalized care is changing the way people are diagnosed. One person with the same visual symptoms as another may not necessarily require the same treatment. To find the right treatment however, requires a lot of data analytics. When so much biological information is collected about one person, newer approaches must be considered to make something useful out of the data. This post specifically talks about how biological networks can be created by mining various PubMed abstracts. These biological networks are used to determine the relationship between various biological entities.

How it works

Researchers at the University of Tennessee have created a program, Chilibot (chip literature robot) that constructs a relationship network between genes, proteins, drugs and biological concepts based on analysis of records in PubMed’s database. These relationships can either be inhibitory or stimulative and are encoded into a network map. An example of this process looks like this:

1. Queryable terms include gene symbols and free-form search terms
2. Chilibot then retrieves results containing these search terms and their synonyms
3. The texts containing the results are parsed into units of one sentence
4. A set of rules is used to classify these sentences into one of these categories:
– stimulatory (interaction present)
– inhibitory (interaction present)
– neutral (interaction present)
– parallel (interaction present)
5. Retrieved relationships are then visualized. Nodes are used to represent query terms and edges are used to represent relationships

The network

img1

This is a biological network generated by Chilibot. In this example, the algorithm queried PubMed’s database to figure out the the effect of cocaine on a set of genes. The search terms were “plasticity” and “cocaine.” Edges connecting each node represent the relationships between them. These edges can either be gray, green or red and indicate the nature of the relationship between nodes. The nodes are actual genes and search terms and are color coded to represent the difference between the two.

Conclusion

This algorithm is use a consolidated archive of scholarly medical articles to extract information that would take a long time to gather manually. It may not be the most accurate way to deduce relationships but it is unbiased in the way it goes about its job. It can at the very least provide inspiration for new research stemming from the connections it determines in its network generation. Creating such a network and visualizing it can be very useful to the field of medicine.

In CS 2850 – Networks, we learn the the basic idea behind a graph of nodes and edges. The positive or negative relationships between individual nodes can indicate something overall about the realm the graph belongs to. In this particular example, we learn what has an impact and what doesn’t have an impact on certain biological entities.

Citations

1. Topinka CM, Shyu C-R. Predicting Cancer Interaction Networks Using Text-Mining and Structure Understanding. AMIA Annual Symposium Proceedings. 2006;2006:1123.
2. Chen H, Sharp BM. Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics. 2004;5:147. doi:10.1186/1471-2105-5-147.

Comments

Leave a Reply

Blogging Calendar

September 2015
M T W T F S S
 123456
78910111213
14151617181920
21222324252627
282930  

Archives