Skip to main content



Google Books Digitization – wrap-up report

This is a follow up to Oya Rieger’s announcement to CU-LIB on 2/18/2015.

In the early weeks of January of 2015, the last book from Cornell’s last shipment to Google for digitization was reshelved, bringing to a conclusion a long and successful collaborative effort by Cornell and Google to digitize well over 500,000 books. Digitization activity spanned about seven years, from October 2008 through December 2014. Although our digitization with Google has ceased for the foreseeable future, our partnership continues with a lower level of effort: Michelle Paolillo will continue her participation in the Google Quality Working Group (more about this further on), and her coordination of improvements to Google created images. Michelle will also continue her work with HathiTrust, the repository where we ultimately store our Google-digitized images, assuring that as many books as feasible can be ingested into that repository.

Google Digitization Overview

Google Digitization was organized into four phases. Each phase included material from both library units and those libraries’ holdings in the Annex. (Numbers for each phase are approximate):

  • Phase 1: 244,000 items – Albert Mann Library, Entomology Library, Lee Library at the Geneva Experiment Station, Bailey Hortorium
  • Phase 2: 99,000 items  – Engineering Library, Mathematics Library, Edna McConnell Clark Library (Physical Sciences), Flower-Sprecher Veterinary Library
  • Phase 3: 82,000 items – Martin P. Catherwood Library of Industrial and Labor Relations, Nestlé Hotel Library, Johnson Graduate School of Management Library
  • Phase 4: 133,000 – John M. Olin Library and Uris Library (humanities and social sciences), and Carl A. Kroch Library (Division of Asia Collections)

Total items sent for digitization is about 558,000 items.

Google Books

Books that are digitized are added to Google Books. Be aware that not every book that is sent can be digitized due to various factors (size, condition, publishers’ stipulations, etc.). Cornell’s overall yield has been high (93%), adding about 519,000 books to Google Books over the course of the project. The level of access there is based in part on the publication date of the title.  If a book is under copyright protection, Google provides limited or snippet views of the material.  If a book is in public domain, Google allows viewing in full.

HathiTrust

Cornell has been depositing the digital books created through our collaboration with Google into HathiTrust.  Even though digitization has concluded, the number of items Cornell has deposited into HathiTrust through the Google partnership is somewhat fluid. This is because ingest into HathiTrust is gated based on various quality metrics related to individual books. Thresholds that drive this gating can and do change over time, and digital books can also be reanalyzed and improved in quality. Often these changes in quality and gating allow ingest of books that were previously ineligible. The Google Quality Working Group, a group of Google partners that are focused on quality improvements of the books created in the Google Library Partnerships, has produced success in working through ingest related issues in the past year, and Michelle will continue to participate with both this group and in her efforts directly with the HathiTrust to maximize CUL’s deposits. The HathiTrust repository grows daily, but at the present, HathiTrust contains over 13 million items, 37% of which are in the public domain.  Cornell has currently deposited almost 516,000 items into HathiTrust.

Initially, the viewability of any item in HathiTrust mirrors that of Google’s, but through systematically addressing issues in rights management, HathiTrust has begun to open up viewability of many items.  Logging into HathiTrust will allow the members of the Cornell community to take advantage of benefits for member institutions, such as the ability to download PDFs of full-view items, organize personal collections, and any new services offered in the future.

Acknowledgments

The Google Digitization Project has been conducted under the sponsorship of Oya Rieger. Of course, the project would not be possible without the work and skills of many people; rather than to repeat individual names, I will draw attention to the last two paragraphs in Oya’s announcement that are dedicated to this purpose. Truthfully, everywhere we prepared shipments, CUL staff have shown exemplary hospitality as our preparation teams and equipment occupied your library spaces. I am deeply indebted to the open hearts, able assistance, and helpful advice I have experienced as this project moved about campus. Many thanks for a fruitful collaboration!

Comments

Comments are closed.

Admin