Google Books and HathiTrust Update – August 2012 : DSPS Press Digital Scholarship and Preservation Services

Google Books and HathiTrust Update – August 2012

Since October 2008, we have been collaborating with Google to digitize material from CUL’s collections, both public domain and in-copyright material. With our shipment of 7/31/12, we have sent over 454,000 items to Google for digitization! What follows is an update of the current standing of both in the Google digitization effort and the state of our holdings in HathiTrust, the repository where we ultimately store our Google-digitized images.

Google Digitization

We have organized the project into phases, each with its specific subject area collections. There are four phases identified so far (numbers for each phase are approximate):

Phase 1: 250,000 items – Mann Library, Entomology Library, Lee Library at the Geneva Experiment Station, Bailey Hortorium Library – completed
Phase 2: 100,000 items – Engineering Library, Mathematics Library, Edna McConnell Clark Library (Physical Sciences), Flower-Sprecher Veterinary Library – completed
Phase 3: 82,000 items – Martin P. Catherwood Library of Industrial and Labor Relations, Nestlé Hotel Library, Johnson Graduate School of Management Library,– completed
Phase 4: currently anticipate 106,000 – Olin and Uris Libraries – underway, with 23,000 already sent.

In the fall of 2011, Google decided to refocus its candidate lists, comprising them solely of items in the public domain. Also, our shipment quotas and frequency were both reduced substantially as Google rebalanced the project among its partners. Google has entirely re-engineered the candidate list process to reduce duplication of candidates from multiple library partners.

Google Books Search

The digitized books quickly move into Google Books. To date about 410,000 volumes have been added to the Google Book Search index from Cornell’s print holdings. The level of access there is based in part on the publication date of the title. If a book is under copyright protection, Google provides limited or snippet views of the material. If a book is in public domain, it can be viewed fully or downloaded.

Google Book Settlement

The case against Google, Inc. by The Author’s Guild, et al. continues to wend its way through the legal system. You may remember that the settlement agreement was rejected. There has been no breakthrough in presenting a redraft that has been acceptable to Judge Chin in the past year. The entire process has been long and complex. If you are interested in learning more about the current status of the Google Book Settlement, ALA’s site provides a good source of information: http://wo.ala.org/gbs/ However, throughout, Google remains committed to the project, and we continue to digitize.

HathiTrust

Cornell has been depositing the digital books created through our collaboration with Google into HathiTrust (http://www.hathitrust.org/). Members of the Cornell community can log into HathiTrust with their Cornell NetID and password. The repository grows every day, but at the present, HathiTrust contains over 10.5 million items, 30% of which are in the public domain. Of these, Cornell has currently deposited over 403,000 items.

All images Cornell has digitized with Google are ingested into HathiTrust. Whenever a Google image is updated, either in its metadata or in improvements to image quality, that item is automatically re-ingested into HathiTrust. In this way, the HathiTrust repository benefits from the same continuous improvement as the images in Google Books. Initially, the viewability of any item in HathiTrust mirrors that of Google’s, but through systematically addressing issues in rights management, HathiTrust has begun to open up viewability of many items. HathiTrust also checks items with regards to the quality of the scanned image and the accuracy of its metadata, collaborating with Cornell and Google to improve quality and correct metadata where needed.

Logging into HathiTrust will allow the members of the Cornell community to take advantage of benefits for member institutions, such as the ability to download PDFs of full-view items, organize personal collections, and any new service offerings in the future.

Bibliographic Access to Digitized Books

The Library shares catalog metadata with Google, HathiTrust, and OCLC at the point of shipment. This metadata allows OCLC to create catalog records in WorldCat for our digitized titles. In addition, CUL-IT has implemented an API that will point Voyager catalog users to digitized versions of our print holdings in Google Books. HathiTrust also has an API that will be developed for inclusion in our next edition of the library catalog, based on Blacklight.

Acknowledgments

The Google Digitization Project and the HathiTrust Ingest would not be possible without the work and skills of many people. Our materials preparation team includes Cammie Wyckoff, Jacob Barnard-Blitz, Seth Barradas, LuAnn Beebe, John Howard, Saw Htoo, Steven Hughes, Nate Miner, Paw Pha, and Michele Payne. Past members of this team include Liz Kluz, Rick Lader, Tom D’Onofrio, and Rich Paige. They have all done an outstanding job, and it is their hard work that continues to bring success to this important digitization effort. We rely on the vital and supportive expertise from Gary Branch, Pete Hoyt, and Lydia Pettis. Kornelia Tancheva and Fred Muratori currently provide guidance for the Phase 4 effort. Jim LeBlanc and Barbara Eden have been sources of sound guidance throughout the project. Joy Paulson served as Project Manager for Phase 1. Oya Rieger continues to oversee the overall initiative. In addition, we have had many partners in the various libraries where we have worked: library staff who have engaged in the project alongside us. As the coordinator of Google and HathiTrust initiatives, want to take this opportunity to thank everyone involved in this collaborative effort.

DSPS Press