Skip to main content

Updating Our Images: Page Insertion and Correction

A new workflow has made it possible for us to attend to quality issues within our Google-digitized books.  As reported a few years ago, Cornell digitizes books by the thousands with our partners at Google, and the resulting digital books are deposited into the HathiTrust Digital Library.  Google has many methods to maintain and even improve the quality of individual page scans it makes, but occasionally something goes awry.    The vast majority of the time the errors are detected and corrected before the book shows up in Google Books and before it is released for the ingest into HathiTrust, but occasionally errors are missed.  (The article by Kenneth Goldsmith “The Art of Google Books Scans” might serve as sampler of various types of things that can go amiss with scanned images:  everything from images taken while pages are still moving, to the capture of the hands that are, quite literally, in the process of making our Google-digitized books.)  Processes to correct our Google-digitized pages have long been cumbersome, requiring extensive decoding and analysis on the part of staff at Cornell, and so were considered not worth the disproportionate resources they required.  The effective result has been that over the past three years I have been collecting reports of images in need of correction from HathiTrust that I could do little more for than apologetically acknowledge.

However, recent changes at Google have tipped the balance of resources required to engage the image correction process.  Google has provided a web form for library partners to create an easier way to engage corrections.  More importantly, Google now has staff resources that perform much of the interpretation required to appropriately name the pages for insertion, appreciably lowering the barrier for our participation in the process.  There were still some workings on our end to figure out how to engage local staff expertise at the Digital Management Group (DMG) for scanning, while keeping our internal process as simple and easy as possible.  There is also still plenty for me to manage – coordinating across three systems (Voyager, Google and HathiTrust) to make sure we all correct the right pages from the right book.  But as we practice in our initial tentative experiments (we have had five to date, and all of them have been successful) we are learning how to cross reference our communications with each other to make this easier.  It is important to note that the successful process is due to this large cooperative effort that includes staff at Google, HathiTrust, and Cornell.  (Here I note a special thanks to Danielle Mericle, who is contributing DMG resources to this effort, and Bronwyn Mohlke who is contributing her scanning expertise.)  Together, we have all begun to chip away at the backlog of HathiTrust tickets reporting images for correction, improving those pages and closing those tickets, one by one.

If you notice pages in HathiTrust that need improvement, please use the feedback link in the footer of the page in the HathiTrust interface.  This automatically opens a form that will ask for information helpful to resolving the problem.  Submission of the form opens a tracking ticket, and often HathiTrust can resolve these issues with Google directly.  When necessary, HathiTrust staff will escalate to the appropriate library partner for the correction process.


Comments are closed.