Skip to main content

Using Hibernate for the Social Network Discovery Project

As the first step of our Social Network Discovery project, we wanted to pull down a bunch of data from some data sources where we could massage it and filter it before it went on it’s way to the next step in the process. We chose to store our data in a MySQL database and our first source of data was a very large XML file containing metadata about thousands of DBLP entries (books, journal articles, etc.) and their┬ácitations.

Essentially, step one of the whole process was to read a big XML file into a MySQL database, preserving the structure and order inherent in the XML. The first challenge was the SAX vs. DOM parser problem: a DOM parser wants to read the whole thing into memory first, whereas a SAX parser operates on the fly as it sees stuff. The DOM parser simply would not do because the dataset was too large and it quickly ran out of memory.

The next problem is: how do you get this stuff into a database? We’re working in Java, so one solution is to just start banging together SQL insert statements in code and issue them to the database programmatically using Java’s SQL libraries. This get’s the job done, but it’s not the prettiest thing to read or maintain. A proper Object Relational Mapping (ORM) tool is the appropriate tool for this job, one of the more popular of which, for Java, is Hibernate, which we used.

In order to use Hibernate, you simply add the appropriate JAR files to your project, annotate (or provide mappings for) your model objects, and then add a little bit of configuration to let Hibernate know about your database and a few other settings. After that, Hibernate provides a nice API for basic CRUD operations: create, read, update, destroy. For simple stuff this, arguably, is overkill, but the real payoff comes when your object model gets more complicated: many-to-many relationships, lazy-loading, cascaded operations. But even for simple models, a nice feature is that Hibernate will completely create your whole database and all of its tables from scratch if you so choose (careful with that feature, you could lose valuable data when it recreates all of your tables!)

Admittedly, I had previous experience with NHibernate, Hibernate’s sister project on the .NET side of things, so much of this was already familiar to me. But I suffered through a bit of frustration with the initial setup, as things are ever-so-slightly different in Java land. Overcoming the learning curve and the initial setup are well worth it though. The return on investment comes with every added object or new object relationship.

If you wish to follow the progress of the Social Network Discovery project, the code is publicly viewable in github here.


Leave a Reply


Recent Comments

    Skip to toolbar