This past semester, Karan Kurani and Jason Marcell were presented with the 2011 EMC Big Data Award at Cornell’s BOOM 2011.
As the first step of our Social Network Discovery project, we wanted to pull down a bunch of data from some data sources where we could massage it and filter it before it went on it’s way to the next step in the process. We chose to store our data in a MySQL database and [...]
We have used two approaches for applying LDA model to our dataset. LingPipe: LingPipe, is a Java library which provides many of the functions required in NLP. We have used LingPipe in the following manner. We have used symmetric KL Divergence to calculate the similarity of a paper with respect to each of the seed [...]
I am using the write up done by Kiyan from here. The Problem: We possess a set of seed papers written on topics in computational sustainability, and a corpus of papers in Computer Science (derived from the DBLP database) that contains the seed papers as a subset. The problem is to develop a similarity measure between papers that [...]