Skip to main content

Developments at the HathiTrust Research Center (HTRC)

HTRC LogoThe HathiTrust Research Center has announced an important development towards the availability of the full HathiTrust corpus to scholars that use computational methods (“text-mining”) in their research.  The press release also includes a timeline for future steps that will make the entire corpus of HathiTrust, regardless of copyright, open for scholars using computational methods.

What is HathiTrust and the HathiTrust Research Center?  The HathiTrust is a partnership of academic and research institutions, who continuously build a digital library together (currently over 14 million volumes) digitized from libraries around the world.  If a book is not subject to limitations of copyright, it can be read online as well.  Items that are subject to viewing restrictions are indexed in full, such that even though scholars cannot read them online, they can search on the text within the covers, and return both the titles of books that contain a given term, and also the page numbers of where these terms are found within a text.  This full-text indexing can be leveraged for broader computational analysis (often called “text mining“) by scholars who find these methods useful.  The HathiTrust Research Center (HTRC) is a joint effort of the Indiana University and the University of Illinois, who partner with HathiTrust to provide the software, infrastructure and computational scale for this scholarly method using the HathiTrust Digital Library as the primary source of text to be analyzed.

What exactly is the recent development, and who sees immediate benefit?  The HTRC has provided the analytics portal to anyone who makes an account.  In the portal, scholars can create a collection of their own, and run computational algorithms against that collection.  The portal also offers the HTRC bookworm (open source software tied to a segment of the corpus), the data capsule (a virtual machine environment suitable for a scholar to load their own tools) and several data sets.  With all of these tools, scholars have been limited to the segment of the corpus in the public domain.  But beginning this summer, successfully funded Advanced Collaborative Support (ACS) proposals will pilot access to the full corpus (regardless of copyright status) in their projects.  In real numbers, that means that instead of about 5 million books, the ACS scholars can work with the full 14+ million currently found in HathiTrust.

What is the timeline for future steps? The details are in the announcement, but briefly, the plans for the year ahead are:

  • Immediately: Advanced Collaborative Services grant awardees will have access to full corpus.
  • Fall 2016: A new features data set, derived from the full collection at both volume level and page level, will be released.
  • Early 2017: Availability of full HathiTrust corpus through data capsule anticipated for general use.

Please consider me available for questions, concerns and guidance on getting started with the HTRC.


Recently Launched: John Reps Bastides Collection

Cornell University Library (CUL) and Digital Scholarship & Preservation Services (DSPS) is pleased to announce the launch of the John Reps Bastides Collection. This collection presents six decades’ worth of photos taken by Cornell Professor Emeritus John Reps over a series of visits to these unusual planned towns in southwestern France which date back to the thirteenth century.


Reps first visited the area in 1951 and returned several times over the next 60 years to photo document the towns. These trips generated thousands of photos from which the images in this collection were hand picked by Reps. In addition to an unprecedented trove of images documenting the region and the towns, the site also includes writings and contextual material written and assembled specially for the site.



Each of the more than 2,500 photos has been painstakingly plotted on an interactive map by which the viewer can navigate the towns, compare images of the same locations over time and access a Google Street View for the location depicted in each photo.

This collection continues Cornell’s online efforts to make available Reps’ extraordinary collections. This effort began in 2013 with the site, Urban Explorer: The John Reps Travel Photographs, which documents planning practices and responses to urban issues from fifteen countries.

The Bastides Collection offers a rich exploration of an interesting chapter in the history of town planning, and CUL is pleased to make it openly available.

Many thanks go to John Reps for sharing this wonderful collection, and to CUL’s project team: Manolo Bevia, Jenn Colt, Eirva Diamessis, Rhea Garen, Hannah Marshall, Danielle Mericle (formerly DSPS), Jim Reidy, Marsha Taichman and Melissa Wallace.



– Hannah Marshall and Melissa Wallace

AV Streaming Group Recommendations


As some may recall, an AV Streaming Policy Group was formed in Fall of 2014 to coordinate the development of streaming workflows and policies utilizing the existing Cornell services, including eCommons, MediaSpace, and Blackboard. I’d like to extend warm thanks to those involved in the group- Danielle Mericle (Chair, formerly DSPS), Peter Hirtle (formerly DSPS, Copyright), Hannah Marshall (LTS, Metadata), David Ruddy (DSPS), Marsha Taichman (Fine Arts), Mike Tolomeo (Academic Technologies, CIT), Melissa Wallace (DSPS), Jesse Koennecke (LTS) and Wendy Wilcox (Access Services). There was a lot accomplished that I hope to review, along with sharing our final recommendations draft here:

As a bit of an overview, Kaltura is our AV streaming engine and it simply holds AV content and basic metadata, including caption files (if available), and then streams video to desired locations. Think of it as a water hose, it simply delivers water (or video) into your garden, your cup, (or eCommons), etc. It is a robust streaming mechanism that produces a wide variety of codecs (apps for encoding and decoding digital data streams) and compression types, playing appropriate versions according to the strength of the user’s internet connection and chosen interface.

With the updated look and feel of eCommons, we hope to better utilize the platform as a place for content where permanent access is desired. With regard to audiovisual content, we recommend eCommons as a common CUL repository that handles a wide variety of standard file types for download and now, streaming. Along with a publication quality file for download, the addition of a streaming player (Kaltura) is required for AV content to be viewable within the eCommons interface. Currently this embedding is something provided by library staff. This may be addressed in development as the group moves forward. Note: For digitized video content, 10-bit master files are not stored within eCommons, as they are too large for normal download capability.

CUL MediaSpace is a “YouTube”-like interface that provides short- to medium-term access to a range of AV content related to Cornell University Library. It is an out-of-the-box user interface for Kaltura. This is a place for “stand-alone,” ephemeral content and licensed content needing narrower access limitations than Cornell-only. An advantage of the MediaSpace environment is that it does allow access limitations on a granular level, down to a single IP address. One disadvantage is that content is only available to those with a Cornell net ID or guest ID. It is good for restricted-access AV content, content under development, or content with a limited life-span. If long-term access is desired, use eCommons.

CUL Digital Collections Portal is our Hydra-based platform for delivering a wide range of content originating from CUL’s collections. It is library-managed and published, currently with a focus on digital collections. This environment is good for thematically cohesive collections of AV and other content that require or benefit from substantial surrounding explanatory material to provide context, history, etc., and which would benefit from cross-searching other thematic collections within the portal. Search faceting is customized in order to better fit CUL’s growing digital collections. Development of this portal is ongoing and CUL is now an official partner in the Hydra community.

Workflow development occurred during the course of this groups work. CUL’s Course Reserve staff can now embed course AV materials into Blackboard at the request of faculty. This is done by going into a course in Blackboard and adding a link from Kaltura that allows the requested item to stream into the Course Materials section of the Blackboard Class. This helps us limit access to a course and to not avail access to a larger audience. A big thank you to Mike Tolomeo and Wendy Wilcox for working through this with me.

Finally, navigating the access provisions of audiovisual content can be tedious, due to complex (and in some cases, antiquated) copyright law. The library holds many different classes of AV material, from temporarily licensed materials to collections with donor restrictions. Peter Hirtle, Danielle Mericle and Amy Dygert created a matrix (found on the final recommendations page in Confluence) to help with this.

Any feedback is welcome as we move forward. I plan to periodically review and revise these policy recommendations with a small subset of the original group from DSPS, including Gail Steinhart (Scholarly Communication), Melissa Wallace (Web Design), Amy Dygert (Copyright), Karl Fitzke (Audiovisual Specialist) and Dianne Dietrich (Digital Curation).

Tre Berney

(on behalf of the AV Streaming Group)

What is ORCID and why should we care?

What is ORCID and why should we care?

ORCID iDs are unique identifiers for researchers. They provide a simple and standardized way to unambiguously link authors to their publications (and potentially other entities such as organizations), and are increasingly required by publishers and funders.

Let’s start with a couple of definitions:

  • ORCID stands for Open Researcher and Contributor ID. ORCID is an open, non-profit entity that provides a registry of unique identifiers for researchers, and the means to link the products of research to their creators.
  • An ORCID identifier, or ORCID iD, refers to the unique identifier itself, a 16-digit number, and is associated with an individual person.

ORCID was born of the need to uniquely and authoritatively identify people, to link people to the works they produce, and potentially to other entities such as the organizations for which they work. People with even marginally common names will appreciate the problem, as will those who have changed their names for any reason, have been inconsistent in how they record their names as authors, have multiple family names, or who have published in multiple (and disparate) disciplines. In each of these situations, it can be difficult to tell which person is truly associated with an article, book, or other work, simply based on the author’s name.

If ORCID iDs are widely adopted, there is significant potential to use them to streamline and simplify functions such as faculty reporting, and to leverage the information available for assessment purposes such as understanding how successful Cornell graduates are in their academic careers once they leave Cornell. It’s this potential that has led us to develop a plan to promote adoption and use of ORCID iDs at Cornell.

But before we get too far along with that plan, it’s critical to understand what’s needed to get the most out of the ORCID infrastructure. Not only must individuals register for an ORCID iD, they should also add information to their ORCID record, authorize Cornell to associate their ORCID iD with their Cornell netID, and use their ORCID iD in workflows whenever it’s possible to do so. Possible Cornell applications for ORCID include using it to streamline faculty reporting processes, uniquely identifying authors in our digital repositories, and integrating it into research information systems.

Individuals can also choose to authorize other services and organizations besides Cornell (such as CrossRef, Scopus, and others) to push or pull information to/from their ORCID profile, which can greatly simplify the process of keeping their profile information current. Researchers “own” and control their ORCID iDs, and can decide whether and with whom they share information, and whether any of their information is displayed publicly.

Even before we started formulating a plan, individuals at Cornell were signing up for ORCID iDs. As of fall 2015, there were nearly 2,000 ORCID iDs with email addresses suggesting a possible Cornell affiliation, yet only two to three dozen of these had authorized Cornell to associate their ORCID iD with their netID. This is very impressive adoption for a campus with no organized outreach effort, but we’ll be unable to leverage those ORCID iDs until those iDs are connected to Cornell. This points to the very real need to ensure that adopters complete all the steps needed in order to maximize the potential benefits of ORCID at Cornell. While some institutions have opted to assign ORCID iDs to everyone, the ORCID organization no longer recommends this, and there are no plans to adopt this approach at Cornell. That means we have some work to do.

Here’s a very high level view of the plan (available in full here):

  • Fine tune the application that allows individuals to obtain an ORCID iD and connect it to Cornell (or authorize that connection for an existing ORCID iD).
  • Work with liaisons to prepare them to work directly with faculty to obtain an ORCID iD, authorize Cornell (and optionally other parties), and add information to their profiles.
  • Promote ORCID to administrators and faculty via a communication campaign and direct, in-person outreach by liaisons to faculty to walk them through the steps of obtaining the ORCID iD, populating their profiles, and getting connected to Cornell.
  • Integrate ORCID iDs with Scholars@Cornell.
  • Investigate and possibly implement support for ORCID iDs in CUL’s institutional repositories.
  • Communicate with other Cornell stakeholders about the potential for integration into campus systems.
  • Plan to sustain support for ORCID at Cornell going forward.

Sandy Payette, Gail Steinhart and Simeon Warner form a steering group to move this plan forward. Oya Rieger and Dean Krafft are the Library Executive Group sponsors. We’ll no doubt get help from many more people along the way, and CUL liaisons will be critical to the success of this effort.

In the meantime, if you find yourself discussing ORCID with faculty or staff and want some suggestions as to how to explain its value and potential, here are a couple of talking points. Registering for and using your ORCID iD will help you:

  • Get credit (for your work)
  • Get ready (for increasingly required use of your ORCID iD in processes such as manuscript submission, grant applications and reporting)
  • Get connected (to Cornell) by visiting

We’ll provide periodic updates to CUL staff as we proceed. Please don’t hesitate to contact any team member, or, with questions or comments.

Best, 0000-0002-2441-1651 (Gail Steinhart)

Digital Image Collections Consolidation in Shared Shelf

For almost a decade, we have been supporting and maintaining the LUNA Insight and ARTstor visual image delivery platforms for various types of visual resources.  LUNA was added earlier as an asset manager for images, and many of our legacy collections were until very recently housed there.  The LUNA architecture was administrated in-house.  ARTstor was added somewhat later to the CUL repository landscape, and by contrast is a hosted platform.  The strength of ARTstor is its flexibility to meet users’ specific needs, and the ease with which we can build new digital collections.

Since 2009, Cornell University Library has been in partnership with ARTstor in the development Shared Shelf.  Our involvement informs us that the newer system is more effective in meeting CUL’s increasing needs in building and supporting visual image collections as well as supporting their integration in teaching and learning. As the number of collections in ARTstor and Shared Shelf rapidly grew, we reached a tipping point that motivated us to plan for the migration of our legacy collections from LUNA to ARTstor, consolidating the collections, and streamlining our collective effort toward maintaining fewer platforms.

The migration process was exactly what we anticipated it to be – complex and challenging.  Most collections were migrated to ARTstor and Shared Shelf. The Herbert F. Johnson Museum of Art collection is migrated to eMuseum.  eMuseum is a powerful web publishing toolkit that integrates with The Museum System (TMS), the collection management software that HFJ Museum uses.

There were a myriad of details to work through in most cases, many stakeholders to coordinate. However, the migration has also has been rewarding: the migration process gave us opportunity to normalize collections for preservation purposes by organizing and archiving master files in more meaningful way.

Last week, we concluded the migration and decommissioned LUNA.  We are grateful to the Visual Resources Working Group for providing input and guidance during this project.  We would especially like to specifically thank the Jason Kovari, Danielle Mericle, Liz Muller, Hannah Marshall, Rhea Garen for their key roles in this effort.  If you have any question, please send an email to


ARTstor provides access to our restricted digital image collections. These collections are restricted to Cornell University faculty, staff, and students for educational purposes (instruction, study, research, and scholarship) only. Here is the instruction how to remotely access collections off the campus.

ARTstor provides access to our restricted digital image collections. These collections are restricted to Cornell University faculty, staff, and students for educational purposes (instruction, study, research, and scholarship) only. Here are the instructions on how to remotely access collections off the campus.



Most of our digital image collections are available via Shared Shelf Commons, a free, open-access library of images from academic and cultural institutions.


Recently Launched: CUL Digital Collections Portal

The Digital Collections Portal Team, consisting of staff from Digital Scholarship & Preservation Services (DSPS), CUL Information Technology (CULIT) and Library Technical Services (LTS), recently released the beta version of the new Digital Collections portal. The portal provides access to several digital collections, including those being migrated from DLXS to Hydra, as well as selected Cornell collections from Shared Shelf Commons. Features include faceted searching and browsing, a IIIF viewer for image zoom, a map interface for discovery of items with geolocation data, and image downloads.

The following collections are included in this beta release:


Alfredo Montalvo Bolivian Pamphlets Collection
A collection of 715 digitized pamphlets documenting a century of Bolivian literate culture, beginning in 1848. They show a nation’s struggle to establish viable institutions, to develop its economy, to educate its children and the back and forth of political argument.







Beyond the Taj: Architectural Images and Landscape Experience in South India
A collection of materials on South Asian architecture assembled over a 22 year period by Professor Robert D. “Scotty” MacDougall (1940-1987), an architect and an anthropologist. The core of this collection consists of approximately 3,000 photographs depicting significant works of architecture through time and across regional traditions throughout continental India.



Huntington Free Library Native American Collection
One of the largest collections of books and manuscripts of its kind, the Huntington collection contains extensive materials documenting the history, culture, languages, and arts of the native tribes of both North and South America. Contemporary politics and human rights issues are also important components of the collection.



John Reps Collection – Bastides
Cornell Professor Emeritus John Reps began to explore and photograph these newly founded towns of the 13th-century in 1951. This collection of images recording what he saw then and on 5 later visits document the appearance of these unusual examples of medieval urban design.




New York State Aerial Photographs
This collection presents a series of historical aerial photographs of the state of New York. It was produced under a Cornell University Library Faculty Grant to Eugenia M. Barnaba, Program Leader, Resource Inventory Cornell Institute for Resource Information Sciences.





Persuasive Maps: PJ Mode Collection
This is a collection of “persuasive” cartography: maps intended primarily to influence the opinion of the viewer — to send a message — rather than to communicate geographic information. The collection reflects a variety of persuasive tools: allegorical, satirical and pictorial mapping; selective inclusion or exclusion; unusual projections, graphics and text; and intentional deception. Maps in the collection address a wide range of messages: religious, political, military, commercial, moral and social.



Ragamala Paintings
Cornell’s Rāgamālā collection consists of some 4000 photographs Klaus Ebeling took between 1967 and 1972 as he visited museums and private collections all over the world working on Ragamala Painting. Fifty years later the slides were gifted to Cornell, thanks to musicologist Joep Bor. The Ebeling collection is among the world’s great assemblages of images in this genre. There have been numerous subsequent studies of regional traditions of rāgamālā painting — Ebeling’s collection includes them all.



This month the team is undertaking an upgrade from Fedora 3 to Fedora 4. While this will result in a brief pause in collection ingest, it will also provide a forward-looking infrastructure for all of our collections in Hydra.

Thanks and appreciation go out to the Digital Collections Portal Team:

  • John Cline (CUL-IT)
  • Jennifer Colt, co-lead (DSPS)
  • Christina Harlow (LTS)
  • George Kozak (CUL-IT)
  • Mary Beth Martini-Lyons (DSPS)
  • Michelle Paolillo (DSPS)
  • Jim Reidy (CUL-IT)
  • Adam Smith, co-lead (CUL-IT)
  • Melissa Wallace (DSPS)

Special thanks also go to Steven Folsom, Hannah Marshall and Danielle Mericle for their past and ongoing support.

We look forward to growing the number of collections accessible through the Digital Collections portal, and invite you to send any feedback or questions to Jenn Colt ( and Adam Smith (

arXiv Annual Update

arXiv started 2015 with an important milestone as we added the one-millionth paper at the end of December 2014 (press release & video). Since its inception in 1991 with a focus on the high energy physics community, arXiv has significantly expanded both its subject coverage and user base. During 2015, the repository saw 105,000 new submissions and over 139 million downloads from all over the world. arXiv has international scope, with submissions and readership from around the world, and collaborations with U.S. and foreign professional societies and other international organizations.

arXiv’s funding and governance is based on a membership program that engages libraries and research laboratories worldwide that represent the repository’s heaviest institutional users. We are pleased to report that we currently have 188 members representing 23 countries. arXiv’s sustainability plan is founded on and presents a business model for generating revenues. Cornell University Library (CUL), the Simons Foundation, and a global collective of institutional members support arXiv financially. The financial model for 2013-2017 entails three sources of revenues:

  • CUL provides a cash subsidy of $75,000 per year in support of arXiv’s operational costs. In addition, CUL makes an in-kind contribution of all indirect costs, which currently represents 37% of total operating expenses.
  • The Simons Foundation contributes $50,000 per year (is raised to $100,000 starting in 2016) in recognition of CUL’s stewardship of arXiv. In addition, the Foundation matches $300,000 per year of the funds generated through arXiv membership fees.
  • Each member institution pledges a five-year funding commitment to support arXiv. Based on institutional usage ranking, the annual fees are set in four tiers from $1,500-$3,000.

In 2015, Cornell raised approximately $372,000 through membership fees from 188 institutions and the total revenue (including CUL and Simons Foundation direct contributions) is around $815,511. We are grateful for Simons Foundation’s support. The gift has encouraged long-term community support by lowering arXiv membership fees and making participation affordable to a broader range of institutions. This model aims to ensure that the ultimate responsibility for sustaining arXiv remains with the research communities and institutions that benefit from the service most directly.

Since we started the arXiv sustainability initiative in 2010, an integral part of our work has been assessing the services, technologies, standards, and policies that constitute arXiv. Here are some of our key accomplishments from 2015 to illustrate the range of issues we have been trying to tackle. Please see the 2015 Roadmap for a fuller account of our work.

  • Evaluated the arXiv administration processes in light of evolving moderation tools and staffing needs and created and posted a new position (arXiv Operations Manager) to ensure a more productive administrative staffing configuration.
  • Reviewed the current arXiv endorsement procedures and policies across all subject categories for seeking greater uniformity and transparency.
  • Proposed and modified a new appeal process to work toward uniform policies across all subject categories.
  • Continued improving tools and interfaces to allow moderators to interact more directly and efficiently with the arXiv system and administrators based on input from the Scientific Advisory Board and moderators (to be continued in 2016).
  • Initiated a process to update, reorganize, and better document the TeX system, which is a central component of our article processing and will continue this project in 2016.
  • Added ORCID author identifier support for better interoperability with other repositories implementing authority control and also as a route toward providing institutional statistics for member organizations.
  • Began to review and refine the “stock” messages used by arXiv administrators when communicating with submitters and other arXiv users to improve their usefulness.
  • Developed a set of questions for assessing and accepting new subject domains to arXiv.
  • Piloted an online donation button to experiment with ways to expand arXiv’s revenue sources (generated $16,000 in one week).
  • Investigated interoperability requirements to enable communication/exchange between arXiv and institutional repositories.
  • Maintained worldwide network of arXiv moderators–over 150 subject experts who verify that submissions are topical and of interest to the scientific community, follow accepted standards of scholarly communication, and are classified in the appropriate subject categories.
  • Held discussions with NSF program managers to better understand how arXiv’s ongoing operations and new initiatives might best fit in to NSF programs.
  • Held an annual meeting for the Scientific Advisory Board (SAB) and Member Advisory Board (MAB) to discuss IT development priorities, financial state, moderation tools and policies, and fund raising strategies.

From the users’ perspective, arXiv continues to be a successful, prominent subject repository system serving the needs of many scientists around the world. However, under the hood, the service is facing significant pressures. The conclusion of the recent SAB and MAB annual meetings was that, in addition to the current business model with a focus on maintenance, the arXiv team needs to embark on a significant fund raising effort, pursuing grants and collaborations. We need to first create a compelling and coherent vision to be able to persuasively articulate our fund raising goals beyond the current sustainability plan that aims to support the baseline operation. We’d like to use the approaching 25th anniversary of arXiv as an important milestone to engage us in a series of vision-setting exercises. The 2016 roadmap includes our goals within the scope of the current business model. In addition, we have developed an initial arXiv review strategy to be refined and implemented during 2016.

Cornell University Library, arXiv Team

Chris Myers (Scientific Director), Oya Y. Rieger (Program Director), David Ruddy (User Support Lead), Simeon Warner (IT Lead)

Contact email:

If you are interested in getting updates from the arXiv team and have not yet signed up for the mailing list, send an email message to: Leave the subject line blank and the body of the message should be a single word: join

Invitation to Apply for Digital Scholarship Fellowships

We are pleased to invite applications for the Digital Scholarship Fellowship position. Hosted by the DSPS unit since 2012, the fellowship program aims to provide opportunities for CUL staff to expand their skills and experiences in developing, delivering, and assessing digital scholarship services. It supports the CUL objectives of “empowering staff to explore gaps in their areas of expertise” and “promoting flexible staffing among the units.” The application deadline is February 29, 2016 for fellowship terms starting during March-October 2016 timeframe.

DSPS Fellowship Ideas

Here are some examples of fellowship projects to consider:

  • Recruit new content to eCommons, which could serve an important function in preserving and providing access to materials produced by Cornell’s many centers and institutes. Work in this area could include developing and documenting best practices and workflows for collecting and managing these materials, identifying candidate centers and institutes, and reaching out to work directly with centers and institutes (with CUL liaisons, when appropriate) to establish and build their collections in eCommons.
  • Investigate implementation strategies for ORCID (Open Researcher and Contributor ID). ORCID identifiers uniquely identify scholars so that they can be unambiguously associated with their works, and are becoming increasingly important in ensuring interoperability of digital scholarly systems. Some research into how other institutions are approaching the assignment of ORCID ids and are integrating them into their information systems and workflows would be very helpful. Work could continue on to include identifying approaches to implementing ORCID ids at Cornell (assignment, stakeholder identification, systems integration).
  • Use the Trustworthy Repositories Audit & Certification Checklist (TRAC) to evaluate the current status of eCommons policies, workflows and documentation. Help draft additional policies and documents as needed, and recommend improvements to current practices. The intent wouldn’t be full certification of the repository, rather it is to make use of relevant parts of an existing tool for a local repository audit for internal purposes.
  • Design and conduct a comprehensive survey of CUL digital assets – characterizing them in terms of origin (born-digital/digital analog) aggregate size, content type, security class (sensitive information or not/rights and/or rights clearance information), and stakeholder requirements for access, discovery, etc. The intent would be to triage these towards various preservation solutions as needed based on the significant properties of the materials involved, surface gaps in our fabric of repositories, and support any appropriate recommendations.
  • Sharpen your user experience (UX) assessment skills by contributing to the evaluation of CUL’s digital collection and repositories (e.g., eCommons, visual resources, etc.) to review their practical aspects such as utility, ease of use, and efficiency. How are such services and systems meeting the actual needs of our faculty and students? How do they fit in their daily work flows of research and teaching?
  • Contribute to the management and dissemination of Cornell theses and dissertations. First, with oversight from the Thesis/Dissertation Advisory Group (TDAG), and in collaboration with the Graduate School, develop educational material and outreach strategies to help graduate students understand and make choices with respect to access embargoes, plans for future publications based on their thesis or dissertation, open access choices, and copyright management. Second, with oversight from the TDAG, census graduate programs at Cornell that are not administered by the Graduate School, and develop strategies and recommendations for collecting theses or other projects produced as a requirement of graduation.
  • Assess the needs of CUL selectors, Library administration, and other stakeholders in CUL collection development and management for collections metrics and analytics. Collections data analysis can help to improve the quality of CUL’s collection and the alignment of collecting activity with the needs and strategic directions of the University; it can identify cost savings and inform decisions about the allocation of library resources. This project would entail determining who in the Library needs collections data to answer which questions and identifying potential sources of relevant cost, usage, and demographic data to address high-priority needs. The project would also produce recommendations for the useful analysis of collections data to support routine collection development decisions, periodic reports, internal and interinstitutional collaborations, special projects, etc.
  • Develop and manage the process of reconciling the Internet Archive deposit of assets into HathiTrust. The initial pipeline for the flow of assets from Internet Archive to HathiTrust has been set up, and about 57K items have been ingested, but as is common with large scale deposits, a proportion of items need remediation to allow for deposit.  Our desire is to analyze the ~20K stragglers for commonalities, classifying them to determine the best way to remove impediments to ingest, and facilitating that remediation to allow them to successfully join the deposit in HathiTrust.  This project is especially suited for those with analytical and technical skill in using MS Access and MS Excel, enjoy managing projects, and enjoy working in a relatively large scale.

These are just some examples to illustrate the nature of fellowship projects. Other ideas related to the DSPS programs and goals are welcome. Information about the DSPS program is available at

Digital Scholarship Fellows, 2012-2015

During the last six  years, DSPS has been very fortunate to host seven excellent fellows, all very motivated, creative, and resourceful.  We are grateful for their contributions and hope that they found the experience useful and gratifying. They are available to talk with interested parties about their fellowship experiences.

Here is a brief description of their fellowship projects and their titles and affiliations during their fellowship:

Jim DelRosso, Hospitality, Labor, and Management Library

JimDelRosso_headJim’s fellowship  focused on digital repositories. His primary goal was to work with DSPS and stakeholders around CUL to craft a digital repository policy that addresses questions of software, workflow, collection development, and sustainability, while fulfilling the need for both straightforward access to and robust preservation of the items stored in CUL’s digital repositories.  As a component of his fellowship, he contributed to the efforts in creating an agenda for the newly established Repository Executive Group and became the first chair. Jim’s DSPS fellowship was for one year at 0.25 FTE.

Dianne Dietrich, Physical Sciences Library, EMPSL 

DianneDianne joined the team of our NEH-funded project on Preservation and Access for Digital Art Objects as the lead Digital Forensic Analyst. This project represented a collection-wide investigation of preservation and emulation strategies for complex born-digital media. Dianne led the project’s technical team and helped develop preservation workflows that would be a baseline for CUL digital forensics services in the years to come.  As a part of her fellowship, she has been representing the project at national forums and conferences. Dianne’s fellowship was for two years at 0.5 FTE and continued as CUL’s digital forensics specialist at 0.20 FTE.

Erin Eldermire, Research and Assessment Unit

ErinEldermireheadErin’s goals for the DSPS fellowship were to contribute to the development of the library website; to explore assessment-related issues for CUL’s digital collections; and to learn from the members of the DSPS Unit towards her future career as a librarian.  In her DSPS Press   blog, she shared her thoughts on how the Library can enable users to employ a simple search box such as Google, while still allowing them to dive into our vast collection. Erin’s fellowship was for six months at 10 hours/week.

Steven Folsom, Library Technical Services

Screen Shot 2015-11-18 at 12.26.31 PMSteven’s primary goal was to identify and develop strategies for improving discovery and access of CUL’s content archived in HathiTrust, including outreach and community building (e.g., communication with HathiTrust and data aggregators about interoperability opportunities). He also contributed to the CUL’s efforts to migrate the DLXS-based image databases to other systems such as HathiTrust, Hydra, and SharedShelf. Included in his goals was engaging in the CUL efforts to assess Omeka/Spotlight/Drupal as platforms for creating web-based exhibits and rich-media collections. Steve’s DSPS fellowship was for one year at 0.25 FTE.

Noah Hamm, Mann Library

Noah_Hamm_jan2014Final_jpg_crop_displayDuring his fellowship, Noah was interested in exploring how GIS and visualization techniques and tools are being used in supporting humanities research and teaching, in collaboration with the library staff interested in digital humanities programs. He also was involved in a campus-wide group to survey AV preservation needs across Cornell by conducting stakeholder interviews and gathering data about the condition and value of digital content. His fellowship term was 6-month, 12 hours/week.

Hannah Marshall, Library Technical Services

hannahHannah Marshall has assumed a 0.25FTE, 5-month term to coordinate the Digital Consulting and Production Services (DCAPS) during the DSPS reorg transition stage.  Currently she is coordinating the DCAPS operation and works closely with the DCAPS team members to facilitate communication. She has been instrumental in coordinating the outreach process for the Arts and Sciences Grants Program. She also networks with stakeholders such as library subject specialists to make sure that there is sufficient user input to support the development efforts.

Gail Steinhart, Mann Library

GailAs the first DSPS fellow,  over the course of her one year fellowship with DSPS (2012-2013. 0.5FTE), she chaired a newly formed group to address issues related to the management of Cornell’s electronic theses and dissertations (ETDs), including facilitating discussions with the Graduate School, which led to a revised set of embargo options that will be implemented when upgrades are made to the online submission tool used by graduate students to submit their theses ETDs. She reviewed and reported on the results of a pilot project examining the use of Johns Hopkins’ Data Conservancy to host data sets associated with papers uploaded to arXiv, led the production of a white paper examining current approaches to digital repositories within CUL, and contributed to other DSPS efforts such as educating librarians on current issues in scholarly communication (with particular emphasis on research data management and sharing). Finally, she led the development of a collaborative grant proposal to the Institute for Museum and Library Services with the University of Wisconsin-Madison, Columbia University and CalPoly, to develop and share a set of best practices for collecting, documenting and disseminating the research data of faculty nearing retirement.

For More Information About the Program:

  • Interested CUL staff members are encouraged to discuss the fellowship position with their supervisors first.
  • If you have questions regarding the HR arrangements and funding please contact Lyndsi Prignon at <>.
  • Issues related to the program areas, potential projects, and the scope of the fellowship should be addressed to Oya Rieger <>.
  • Oya Rieger and Lyndsi Prignon will be glad to talk with interested staff and their supervisors about  logistical details such as making back-up arrangements and ways to accommodate the candidates’ existing responsibilities and goals.

Application Information:

  • We will have 2-3 positions open to CUL staff with a term of 6-12 months at a part-time capacity (0.25 FTE).
  • Although there are no prerequisite skills required, the candidates need to be familiar with the recent trends and practices in one of the digital scholarship program areas (e.g., repositories, publishing, research data, digital collections, digital preservation, preservation policies, etc.).
  • To apply, send a copy of your CV to with a cover letter describing the program areas of interest and expectations from the fellowship.
  • The applications will be reviewed by a small committee with input from the candidate’s supervisor.
  • The application deadline is February 29, 2016 for fellowship terms starting during March-October 2016 timeframe.

Oya Y. Rieger, January 2016

Announcing the Preserving and Emulating Digital Art Objects White Paper

We are pleased to announce that the Preserving and Emulating Digital Art Objects White Paper describing the project’s findings, discoveries, and challenges is now available. The ultimate goal of the project team has been the creation of a preservation and access practice grounded in thorough and practical understanding of the characteristics of digital objects and their access requirements, seen from the perspectives of collection curators and users alike. Equally important has been the establishment of service frameworks and policies that are sustainable, realistic, and cost-efficient. So all through the project, one of our principles has been moving the experience gained through research into practice. Although the initiative focused on new media art, we hope that our methodologies and findings will inform other types of complex born-digital collections as well.

Throughout our project, a reoccurring theme in our findings involved the difficulties associated with capturing sufficient information about a digital art object to enable an authentic user experience. We have concluded that the key to digital media preservation is variability, not fixity. The trick is finding ways to capture the experience—or a modest proxy of it—so that future generations will get a glimpse of how early digital artworks were created, experienced, and interpreted. So much of new media works’ cultural meaning derives from users’ spontaneous and contextual interactions with the art objects. Providing appropriate cultural and historical contexts for understanding and interpreting new media art is part of each institution’s individual mission, but also a matter of collective importance, given the rarity of such collections, the numerous challenges of establishing preservation protocols, and the overall scarcity of resources.

During the last few years, we have witnessed several trends and advancements, for instance, the increasing prominence of video and web art.  The organizational, technological, and financial challenges associated with preserving and providing access to web-based resources is quite astounding. So as we develop strategies for CD-ROM-based content, we are mindful that there are more significant challenges ahead. As described in the white paper, although emulation was not included in the original project plan, it emerged as a viable strategy. Institutional experiences and perspectives on emulation will not scale unless there are communities of emulation involving archivists and curators. We need to explore how cultural institutions can interface with groups involved in emulation, which currently is driven by games communities and hobbyists.

We would like to emphasize that, as artists have increasing access to ubiquitous tools and methodologies for creating complex art exhibits and objects, we should expect to see an increasing flow of such creative works to archives, museums, and libraries. It is nearly impossible to preserve these works through generations of technology and context changes. Therefore, diligent curation practices are going to be more essential than ever in order to identify unique or exemplary works, project future use scenarios, assess obsolesce and loss risks, and implement cost-efficient strategies. Also, we would like to emphasize that access is the keystone of preservation. The preservation of digital art objects needs to be conceptualized, motivated, informed, and energized by present and future use.

PAFDAO Project Team

Note: An interview with the PAFDAO team about their efforts is available on The Signal, the Library of Congress’s digital preservation blog. The post is titled Authenticity Amidst Change: The Preservation and Access Framework for Digital Art Objects.

Persuasive Cartography: The PJ Mode Collection

Launched in spring 2015, Persuasive Cartography: The PJ Mode Collection is a collection of “persuasive maps” — maps intended to influence beliefs or communicate opinions, rather than convey solely geographic information. The 310 maps were donated to the Division of Rare and Manuscript Collections by alum PJ Mode ‘60, and were digitized by Digital Consulting and Production Services (DCAPS) and made available online through the Persuasive Cartography website.

This impressive collection has already gotten a variety of press, including articles in Ezra Update and National Geographic. In the coming months, DCAPS plans to add additional items to the collection, and also make it available within a new Digital Collections website (December 2015), which will allow for cross-collection searching and greater discovery. Please check out the collection and let us know what you think.

Ohara, Europe and Asia Octopus Map, 1904

Ohara, Europe and Asia Octopus Map, 1904

Keppler, the American Pope, 1894

Keppler, the American Pope, 1894

Rose, Angling in Troubled Waters, 1899

Rose, Angling in Troubled Waters, 1899

keep looking »