Report from the 8th International Digital Curation Conference (IDCC)
This is probably the single most important general meeting for those working in the area of data curation, and this year’s event drew more participants than ever before. The themes for the meeting were infrastructure, intelligence, and innovation. Here are some of the highlights for me:
Herbert Van de Sompel’s talk, “The Web as infrastructure for scholarly research and communication,” was easily the most interesting of the talks. In a retrospective of about 10 years of research activity that includes the development of OAI-PMH, OAI-ORE, ResourceSync and Memento, he traced a fundamental change in thinking about the web, our understanding of what constitutes core infrastructure on the web, and how to make the best possible use of it. This is a fascinating talk and I recommend giving it a view/listen (video, slides only).
Ewan Birney of the European Molecular Biology Laboratory explained why molecular biology archives have a tradition of openness (it’s the way the science gets done), and how improvements in sequencing technologies have resulted in a shift from sequencing being the bottleneck to analysis being the bottleneck. That’s not news for anyone familiar with the area, but the best part came when he gave us a sneak preview of the use of DNA as a potential storage medium for digital information. We were asked not to tweet or blog that news at the time, but since then, the findings have been published in Nature.
Throughout the conference there was a lot of discussion of the relationship between data and publications, and emerging forms of publication such as the data paper (a brief paper that serves primarily to describe a data set, not so much to share analyses and conclusions based upon the data). I heard several people characterize papers as “the best metadata there is” for a data set. While I don’t disagree that papers are great for conveying deep contextual information about the data – detailed methods, the purpose behind its collection and what it all means – I didn’t hear anyone raise the point that papers make lousy reading for computers. This was why hearing colleague Karen Baker mention a development at ZooKeys, an open access journal in systematics, was so interesting. The Global Biodiversity Information Facility (GBIF) and the publishers of ZooKeys developed a workflow to automatically create a data paper from a standards-based metadata document. This seems like a great approach, as using machine-readable metadata to automatically generate a human-readable “data paper” gives you both products for the same effort.
There were also some moments of tension and frustration throughout the conference. I suspect these arose because of disciplinary differences that become apparent at general meetings such as this one. A discussion of what makes a data scientist was a good example, and many individuals felt that this was not particularly controversial within their disciplines. Across disciplines, about the best you can do is to say that data scientists typically have a combination of domain, analytical, and IT expertise that enable them to participate fully in the research process. This is quite different from the roles most libraries and librarians are embracing, fancy job titles notwithstanding. Cliff Lynch made a good point on workforce issues in this area: most data scientists, however you define them, have been through a succession of careers by the time they reach that position, and this is not particularly sustainable or scalable.
And finally, a few odds and ends from talks, posters, and demos:
- A demo from Open Exeter on using SWORD and Globus to move large data sets into DSpace.
- Trisha Cruse presented on the suite of curation services offered by the California Digital Library. Of particular interest is the work they’ve done on modeling the costs of curation.
- Digital Science’s Kaitlin Thaney described some of the very popular tools they’ve launched (FigShare, AltMetric, LabGuru, SureChem, and others). Again, not news, but learning that Macmillan Publishers is behind Digital Science (coupled with the recent news that Nature is pulling the plug on Connotea) has me wondering about the long term fate of free services supported by commercial entities.
The full program, with links to presentations, is available here: http://www.dcc.ac.uk/events/idcc13