Skip to main content

Cornell University

Adding HathiTrust Emergency Access Links

As part of our response to the library’s closure during the COVID-19 crisis, we added links to in-copyright materials in HathiTrust where we own the print copy. HathiTrust put in good work putting together a circulation model only allowing as many simultaneous users to read a book at once as their institutions own in currently inaccessible print.

To make this happen, we forwarded a full report on our print holdings to HathiTrust, and they generated the overlap report. While it’s possible to do this matching ourselves, using OCLC identifiers and the hathifiles database, it’s important to use HathiTrust’s overlap report for these, as it’s the overlap report that informs the unlocking of the digital materials on hathitrust.org, and using the same list ensures that we will actually have access to the materials we link to.

The report itself is a tab delimited file:

oclc local_id item_type access rights
17304701 10 mono deny ic
1872403 100 multi allow pdus
46181 1000 mono allow pd
35193779 10000 mono
7257650 100000 mono deny ic

 

which I loaded into a MySQL table in the same database we already had loaded a copy of the hathifiles database:

+----------+---------+-------+--------+--------+
| oclc_id  | bib_id  | type  | access | rights |
+----------+---------+-------+--------+--------+
| 17304701 | 10      | mono  | deny   | ic     |
| 1872403  | 100     | multi | allow  | pdus   |
| 46181    | 1000    | mono  | allow  | pd     |
| 7257650  | 100000  | mono  | deny   | ic     |
| 1688787  | 1000001 | mono  | allow  | pd     |

 

Note that I didn’t load the records where the access and rights columns were unpopulated, as these are not part of the overlap. I could have skipped the “allow” records as well, as these are either public domain or public domain (united states) and we would have access regardless of the emergency. The real meat here was the “deny” records.

At this point, we needed the hathifiles database. Because access links on the hathi site use volume identifiers for volume-level links and hathi trust identifiers for title-level links, we need to use the oclc identifiers to find those. Here’s an example record from that database:

Volume_Identifier: uc1.b4590365
Access: deny
Rights: ic
UofM_Record_Number: 000605831
Enum_Chrono:
Source: UC
Source_Inst_Record_Number: GLAD50493685-B
OCLC_Numbers: 17304701
ISBNs:
ISSNs:
LCCNs: 53025148
Title: Gerona; la Catedral y el Museo Diocesano;

There are a lot more fields, but the only fields of interest here are Volume_Identifier, UofM_Record_Number (sometimes called Hathi Trust Record Number or Record ID), and OCLC_Numbers. Because OCLC_Numbers can be multiple, we copied this data out into a volume_to_oclc table with only Volume_Identifier and OCLC_Number, and an index on OCLC_Number for quick volume lookup by oclc identifier.

Once all this is loaded, a simple query can pull up eligible Emergency Access volumes for any given local record identifier.

SELECT volume_to_oclc.Volume_Identifier, UofM_Record_Number
  FROM overlap, volume_to_oclc, hathifiles
 WHERE bib_id = ?
   AND overlap.access = 'deny'
   AND overlap.oclc_id = volume_to_oclc.OCLC_Number
   AND volume_to_oclc.Volume_Identifier = hathifiles.Volume_Identifier

 

Once the results are grouped by record number, we chose to provide volume-level links for any record number with only one volume, and record-level links for any with multiple. Unlike previous links to HathiTrust materials that are in the public domain, we wanted to make sure that patrons would be prompted to log in before ending up at their destination, so we wouldn’t have confused patrons not understanding that they should log in.

HathiTrust provided some documentation on how to generated volume-level and title-level access links that also prompt for login:

http://hdl.handle.net/2027/$VOLUME_ID?urlappend=%3Bsignon=swle:$ENTITY_ID
https://catalog.hathitrust.org/Record/$RECORD_ID?signon=swle:$ENTITY_ID

 

Note that both of these links require a shibboleth entity ID, also called a shibboleth IDP. Cornell’s is “https://shibidp.cit.cornell.edu/idp/shibboleth”, and lets the shibboleth running on HathiTrust know that the Cornell login screen should be loaded. Because both the $VOLUME_ID in the volume link and the $RECORD_ID in the title link are in HathiFiles and not in the overlap report, the HathiFiles table is a must regardless of which link types are used.

Finally, we attached some access caveats and a link to user documentation for patrons who might like to know more about the temporary access program and its limitations.

example catalog record

Example record: https://catalog.library.cornell.edu/catalog/6478772