Skip to main content

Digital Curation Services at Code4Lib NYS 2016

Code4Lib NYS 2016 Unconference was held at Cornell’s Mann Library, Thursday & Friday, August 4-5, 2016.  The Digital Curation Services team (Mira Basara, Dianne Deitrich, and Michelle Paolillo) attended both days, and found many opportunities to sharpen our skills in the service of digital curation, as well as opportunities to network with colleagues beyond Cornell.  The three of us had different interests in attending the unconference, as noted below.  Our varied perspectives reflect the nature of digital preservation itself: how it is integrated with the many other activities of the digital life cycle, and the broad range of skills that come into play in the service of long-term assurance for our digital assets.

Mira: I was very impressed with the number and quality of workshops and sessions offered on Code4Lib. I attended two mornings of a workshop called “Command Line Interface Basics” led by Francis Kayiwa (Virginia Tech). The workshop covered the user and programming interface of the UNIX Operating System. Even though I have been using UNIX Shell for years my knowledge was spotty and this workshop really filled gaps that I have been missing, such as different ways to edit files or filter files, and communication and file archiving. Another interesting session was Introduction to Hydra, which again provided a great insight into ActiveFedora based model.  I feel this will give me better insight of the overall system.  Having a basic understanding of the model will help me if my work leads me to use or administer Hydra-based systems.

Dianne:  When I saw a hands-on Fedora 4 workshop advertised on the Code4Lib schedule, I knew I had to be there. I’m the sort of person who learns best by diving right into using a particular system.  The Fedora repository software has always seemed a bit mysterious to me; I’ve pulled content from a Fedora 3 repository through my past work with electronic theses and dissertations, but beyond that, I wasn’t really sure of its inner workings.  Our instructors, Esmé Cowles (Princeton University) and Andrew Woods (Duraspace), were great — and in addition to providing a high-level overview of Fedora, they provided an introduction to the world of resource modeling, access control, and Apache Camel integration. I was really impressed by how accessible they made the content, and how often they checked in with us to make sure that nobody was lost or too far behind. While I might not be working with Fedora repositories directly, these workshops provided some invaluable context that I can use to understand our own repository infrastructure.

Michelle: I was happy to spend much of the conference in the Write-The-Docs inspired workshop sharpening my documentation skills.  Cristina Harlow (Cornell) and Gillian Byrne (Ryerson) led us in a well-designed experience through both morning sessions.  Together we explored possible structures of documentation, and defined the elements that are necessarily included in complete documentation.  Then we shifted to examples of documentation in an open critique.  We shared examples of documentation we had all run across, explaining what we liked, or did not like, and possible ways to improve them.  After this we had opportunity to work on improving our own documentation.  I worked on making some templates in the CULAR wiki, such that when we add new collections, the template already has prompts for the information that should be captured.  We are encouraged to share our documentation and templates back to the Write-the-Docs community.  Over all, I feel this session has helped me streamline the process of documentation, and to clarify my focus as I write.

whiteboard image

…some brainstorming from the Write-the-Docs workshop…

The keynotes themselves were powerful reminders that our efforts exist in a context of humanity and ethics.  Patricia Hswe’s address served as a reminder that sound relationships among project and service teams are just as important to the project or service success as the technology components (and fundamentally so).  She reminded us that the emotional labor of listening to and acknowledging the needs of stakeholders is important for building successful systems; the degree to which we can become comfortable with uncertainty is the same degree to which we can influence the outcomes of our projects for the better.  Tara Robertson challenged us to consider a broader range of ethical questions when making content available.  We are all familiar with the notion that the ease of making content available digitally may deceive us as to the legality for doing so.  But even when legal right is assured, harm can come to communities if we insist on our right to provide unfettered access, especially to minority communities that may be missing from the conversation that informs any such decision.  Through several examples, she challenged us to transform our profession; to ask not only whether we had legal rights to provide access, but to engage in the ethical question of who we might harm by exercising these rights.  These two keynotes were reminders of the reach of the  impact as we preserve content and make it available;  our mindfulness in this realm can help steer us towards better waters as we navigate the digital age.

We are indebted to Christina Harlow for a well-organized, engaging conference that was so conveniently located.  She will say that she had a lot of volunteer help; doubtless this is true, but it is also true that the success of this conference is largely due to her initiative and follow-through.  This conference provided us with a conveniently located context to sharpen our skills and add tools to our “digital curation toolbox”.  Many thanks!

Findings of the 2CUL Study on Developing e-Journal Preservation Strategies

Faculty and students have increasing dependency on commercially-produced, born-digital content that is purchased or licensed. According to a recent Ithaka S+R study on information usage practices and perceptions, almost half of the respondents strongly agreed that they would be happy to see hard copy collections of journals discarded and replaced entirely by electronic collections (see Figure below). This strong usage trend raises some questions about the future security of e-journals and if and how they are archived to ensure enduring access for future users. Evidence indicates that the extent of e-journal preservation has not kept pace with the growth of electronic publication. Studies comparing the e-journal holdings of major research libraries with the titles currently preserved by the key preservation agencies have consistently found that only 25-30%, at most, of the titles with ISSN’s currently collected have been preserved.

ejurnal use

Percent of respondents who strongly agreed with the statement:”Assuming that electronic collections of journals are proven to work well, I would be happy to see hard copy collections discarded and replaced entirely by electronic collections.” Source: Christine Wolff,  Alisa B. Rod,  Roger C. Schonfeld. Ithaka S+R US Faculty Survey 2015. April 4, 2016.

With funding from the Mellon Foundation, during 2014-2015 Columbia and Cornell Universities (2CUL) conducted a 2-year project to evaluate strategies for expanding e-journal preservation.  The project team included Shannon Regan, Joyce McDonough, Bob Wolven (co-PI) from Columbia University Libraries and Oya Y. Rieger (co-PI) from Cornell University Library. It was a follow-up study to expand on the results of a Phase 1 2CUL project that looked into a number of pragmatic issues involved into deploying LOCKSS and Portico at Cornell and Columbia. The key research questions of the Mellon-funded study included: What is not being preserved?; Why are they not being preserved?; and How do we get them preserved? The purpose of this blog is to share some of the key recommendations and highlight challenges faced during the study.  The report that describes the methodology and findings of the study is available on the project wiki.

Recommendations for Further Action

Major Publishers: As libraries and licensing agencies negotiate new licenses or renew existing licenses, publishers should be asked to specify any licensed content excluded from the license’s provisions for archiving.  We need to engage CRL to explore how the recently revised model license can be further enhanced by broadening the archival information section.

Ensuring Continuity: The preservation status of e-journal titles may change as titles move from one publisher to another. The Enhanced Transfer Alerting Service maintained by the UKSG provides information that could be effectively used to monitor such changes.

Open Access E-Journals:  Freely accessible e-journals comprise the largest, most diverse, and in all likelihood most problematic category for preservation. Columbia and Cornell will work with members of the Ivy Plus group of libraries to assess the feasibility and cost of implementing a Private LOCKSS Network to preserve the pilot collection developed in Archive-It.

Technical Development:  As digital formats become more complex and new research methods emerge (e.g., text mining), just-in-case dark archiving solutions will be harder to justify from cost-effectiveness and return-on-investment perspectives.  It will be beneficial for the stakeholders to reconsider the current assumptions that underlie significant initiatives such as CLOCKSS, LOCKSS and Portico.

Information Exchange:  At present, up-to-date information about preservation status is not included in the systems and knowledge-bases libraries use to manage e-journal content (although the Keepers Registry has significantly enhanced the ability to query the preservation status of individual titles).  This inhibits libraries’ ability to consider preservation as a factor in collection development and collection management.

Setting Priorities: One barrier to effective action has been the sheer number of e-journal titles that are not preserved.  More discussion among libraries is needed to build consensus around priorities for action on titles provided through aggregators and on freely-accessible e-journals.

University/Library Publishers: University libraries engaged in publishing should develop a consistent approach to preservation, including open declaration of their archiving policies and practice.  This work should help to inform, and be informed by, CRL’s exploration of a “TRAC light” certification.


The project team’s most significant impediment was simply the time required to explain the purpose of the project, including libraries’ expectations and needs regarding preservation of e-journals, to many parties with diverse backgrounds and perspectives. Publishers, editors, and aggregators each had different degrees of awareness of issues, but also different understanding of the meaning of terms such as “preservation” and “archiving.” Adding to this challenge was the fact that preservation is not the highest priority for most of the parties we worked with.

Perhaps the most surprising challenge was the degree of questioning we encountered within the library community itself regarding the importance of taking action to preserve e-journals.  This was expressed as a combination of (in our view, misplaced) confidence that publishers and aggregators can be relied on to archive their own content, plus doubts about the technical and economic reliability of existing third-party preservation agencies. The reluctance from librarians to aggressively pursue e-journal preservation may be influenced by confusion as to where the responsibility for preservation lies: with publishers, third party agencies, or libraries.

Individual libraries, despite their concern for preservation, often lack effective means for taking action. Selection and acquisition processes may not involve any direct interaction with the publisher; many titles are acquired as parts of large packages, with no comprehensive provision for preservation. While stewardship of print journals was recognized as a core function of libraries, today commercial publishers provide access to digital content and manage content. Preservation, formerly a distributed activity for printed material controlled at the local level, has come to rely on centralized infrastructures and action in the case of digital material, without clearly defined roles for those staff charged with responsibility for preserving library collections. Some libraries have sought to include provisions for archiving in their e-journal licenses, either through direct deposit of content with the library or, more often, through third-party agencies.

E-journal archiving responsibility is distributed and elusive. Therefore, libraries, archiving organizations, publishers, and societies need to collaborate in developing and promoting best practices such as model license agreements and practical steps leading to the deposit of e-journal content with recognized preservation agencies.

Oya Y. Rieger, July 2016

Assessing and Promoting Digital Collections as part of a DSPS Fellowship

At Cornell University Library, we are rich in digital collections. We have good workflows for identifying still and moving images that need to be digitized. We ensure that the groups of materials are discrete and that they are accompanied by sufficient metadata, then we digitize the materials, catalog them, and frequently build websites to promote the resulting images.

In Digital Consulting & Production Services (DCAPS), we know how to effectively create image collections within the library. What we have a lesser understanding of is how digital collections in our Digital Collections Portal are being used based on quantitative data (such as statistics on site use from Google Analytics and Piwik in combination with pop-up questionnaires) and qualitative data (such as focus groups and personal interviews), in combination with usability testing. Assessing how our digital collections are being used is one aspect of my DSPS Fellowship. From talking with members of the web team, it seems that assessment happens on a more ad hoc basis, and it would be good to have a systematic method of how to gauge the use of our collections on a regular basis.

Cornell Digital Collections Portal homepage

The Digital Collections Portal homepage


Another component of this fellowship is examining how we promote our digital collections. We have incredible assets in the library that are of no use if patrons are unaware of them. It will be beneficial to have a general digital project promotion workflow. For instance, an excellent initiative has been entering information about collections into Wikipedia. If we can make a point of doing this for every new collection, we will have a wider user base for our images. We also highlight images from our digital collections on Instagram. I intend to create a checklist of how to spread the word about new collections after they are launched and on an ongoing basis as the second part of my fellowship.

DCAPS on Instagram

DCAPS on Instagram


Assessment and promotion are important phases of the digital curation lifecycle that are not specifically part of anyone’s role in DSPS and are crucial to the services we provide. Given the fact that DSPS recently created a digital curation unit, it is apparent that better serving the entire project lifecycle is a priority, and this fellowship will aid in improving these workflows.

Marsha Jenn Melissa

From left to right: Marsha Taichman, DSPS Fellow, Jenn Colt and Melissa Wallace, DSPS Web Design and Development Team Members.

Developments at the HathiTrust Research Center (HTRC)

HTRC LogoThe HathiTrust Research Center has announced an important development towards the availability of the full HathiTrust corpus to scholars that use computational methods (“text-mining”) in their research.  The press release also includes a timeline for future steps that will make the entire corpus of HathiTrust, regardless of copyright, open for scholars using computational methods.

What is HathiTrust and the HathiTrust Research Center?  The HathiTrust is a partnership of academic and research institutions, who continuously build a digital library together (currently over 14 million volumes) digitized from libraries around the world.  If a book is not subject to limitations of copyright, it can be read online as well.  Items that are subject to viewing restrictions are indexed in full, such that even though scholars cannot read them online, they can search on the text within the covers, and return both the titles of books that contain a given term, and also the page numbers of where these terms are found within a text.  This full-text indexing can be leveraged for broader computational analysis (often called “text mining“) by scholars who find these methods useful.  The HathiTrust Research Center (HTRC) is a joint effort of the Indiana University and the University of Illinois, who partner with HathiTrust to provide the software, infrastructure and computational scale for this scholarly method using the HathiTrust Digital Library as the primary source of text to be analyzed.

What exactly is the recent development, and who sees immediate benefit?  The HTRC has provided the analytics portal to anyone who makes an account.  In the portal, scholars can create a collection of their own, and run computational algorithms against that collection.  The portal also offers the HTRC bookworm (open source software tied to a segment of the corpus), the data capsule (a virtual machine environment suitable for a scholar to load their own tools) and several data sets.  With all of these tools, scholars have been limited to the segment of the corpus in the public domain.  But beginning this summer, successfully funded Advanced Collaborative Support (ACS) proposals will pilot access to the full corpus (regardless of copyright status) in their projects.  In real numbers, that means that instead of about 5 million books, the ACS scholars can work with the full 14+ million currently found in HathiTrust.

What is the timeline for future steps? The details are in the announcement, but briefly, the plans for the year ahead are:

  • Immediately: Advanced Collaborative Services grant awardees will have access to full corpus.
  • Fall 2016: A new features data set, derived from the full collection at both volume level and page level, will be released.
  • Early 2017: Availability of full HathiTrust corpus through data capsule anticipated for general use.

Please consider me available for questions, concerns and guidance on getting started with the HTRC.


Recently Launched: John Reps Bastides Collection

Cornell University Library (CUL) and Digital Scholarship & Preservation Services (DSPS) is pleased to announce the launch of the John Reps Bastides Collection. This collection presents six decades’ worth of photos taken by Cornell Professor Emeritus John Reps over a series of visits to these unusual planned towns in southwestern France which date back to the thirteenth century.


Reps first visited the area in 1951 and returned several times over the next 60 years to photo document the towns. These trips generated thousands of photos from which the images in this collection were hand picked by Reps. In addition to an unprecedented trove of images documenting the region and the towns, the site also includes writings and contextual material written and assembled specially for the site.



Each of the more than 2,500 photos has been painstakingly plotted on an interactive map by which the viewer can navigate the towns, compare images of the same locations over time and access a Google Street View for the location depicted in each photo.

This collection continues Cornell’s online efforts to make available Reps’ extraordinary collections. This effort began in 2013 with the site, Urban Explorer: The John Reps Travel Photographs, which documents planning practices and responses to urban issues from fifteen countries.

The Bastides Collection offers a rich exploration of an interesting chapter in the history of town planning, and CUL is pleased to make it openly available.

Many thanks go to John Reps for sharing this wonderful collection, and to CUL’s project team: Manolo Bevia, Jenn Colt, Eirva Diamessis, Rhea Garen, Hannah Marshall, Danielle Mericle (formerly DSPS), Jim Reidy, Marsha Taichman and Melissa Wallace.



– Hannah Marshall and Melissa Wallace

AV Streaming Group Recommendations


As some may recall, an AV Streaming Policy Group was formed in Fall of 2014 to coordinate the development of streaming workflows and policies utilizing the existing Cornell services, including eCommons, MediaSpace, and Blackboard. I’d like to extend warm thanks to those involved in the group- Danielle Mericle (Chair, formerly DSPS), Peter Hirtle (formerly DSPS, Copyright), Hannah Marshall (LTS, Metadata), David Ruddy (DSPS), Marsha Taichman (Fine Arts), Mike Tolomeo (Academic Technologies, CIT), Melissa Wallace (DSPS), Jesse Koennecke (LTS) and Wendy Wilcox (Access Services). There was a lot accomplished that I hope to review, along with sharing our final recommendations draft here:

As a bit of an overview, Kaltura is our AV streaming engine and it simply holds AV content and basic metadata, including caption files (if available), and then streams video to desired locations. Think of it as a water hose, it simply delivers water (or video) into your garden, your cup, (or eCommons), etc. It is a robust streaming mechanism that produces a wide variety of codecs (apps for encoding and decoding digital data streams) and compression types, playing appropriate versions according to the strength of the user’s internet connection and chosen interface.

With the updated look and feel of eCommons, we hope to better utilize the platform as a place for content where permanent access is desired. With regard to audiovisual content, we recommend eCommons as a common CUL repository that handles a wide variety of standard file types for download and now, streaming. Along with a publication quality file for download, the addition of a streaming player (Kaltura) is required for AV content to be viewable within the eCommons interface. Currently this embedding is something provided by library staff. This may be addressed in development as the group moves forward. Note: For digitized video content, 10-bit master files are not stored within eCommons, as they are too large for normal download capability.

CUL MediaSpace is a “YouTube”-like interface that provides short- to medium-term access to a range of AV content related to Cornell University Library. It is an out-of-the-box user interface for Kaltura. This is a place for “stand-alone,” ephemeral content and licensed content needing narrower access limitations than Cornell-only. An advantage of the MediaSpace environment is that it does allow access limitations on a granular level, down to a single IP address. One disadvantage is that content is only available to those with a Cornell net ID or guest ID. It is good for restricted-access AV content, content under development, or content with a limited life-span. If long-term access is desired, use eCommons.

CUL Digital Collections Portal is our Hydra-based platform for delivering a wide range of content originating from CUL’s collections. It is library-managed and published, currently with a focus on digital collections. This environment is good for thematically cohesive collections of AV and other content that require or benefit from substantial surrounding explanatory material to provide context, history, etc., and which would benefit from cross-searching other thematic collections within the portal. Search faceting is customized in order to better fit CUL’s growing digital collections. Development of this portal is ongoing and CUL is now an official partner in the Hydra community.

Workflow development occurred during the course of this groups work. CUL’s Course Reserve staff can now embed course AV materials into Blackboard at the request of faculty. This is done by going into a course in Blackboard and adding a link from Kaltura that allows the requested item to stream into the Course Materials section of the Blackboard Class. This helps us limit access to a course and to not avail access to a larger audience. A big thank you to Mike Tolomeo and Wendy Wilcox for working through this with me.

Finally, navigating the access provisions of audiovisual content can be tedious, due to complex (and in some cases, antiquated) copyright law. The library holds many different classes of AV material, from temporarily licensed materials to collections with donor restrictions. Peter Hirtle, Danielle Mericle and Amy Dygert created a matrix (found on the final recommendations page in Confluence) to help with this.

Any feedback is welcome as we move forward. I plan to periodically review and revise these policy recommendations with a small subset of the original group from DSPS, including Gail Steinhart (Scholarly Communication), Melissa Wallace (Web Design), Amy Dygert (Copyright), Karl Fitzke (Audiovisual Specialist) and Dianne Dietrich (Digital Curation).

Tre Berney

(on behalf of the AV Streaming Group)

What is ORCID and why should we care?

What is ORCID and why should we care?

ORCID iDs are unique identifiers for researchers. They provide a simple and standardized way to unambiguously link authors to their publications (and potentially other entities such as organizations), and are increasingly required by publishers and funders.

Let’s start with a couple of definitions:

  • ORCID stands for Open Researcher and Contributor ID. ORCID is an open, non-profit entity that provides a registry of unique identifiers for researchers, and the means to link the products of research to their creators.
  • An ORCID identifier, or ORCID iD, refers to the unique identifier itself, a 16-digit number, and is associated with an individual person.

ORCID was born of the need to uniquely and authoritatively identify people, to link people to the works they produce, and potentially to other entities such as the organizations for which they work. People with even marginally common names will appreciate the problem, as will those who have changed their names for any reason, have been inconsistent in how they record their names as authors, have multiple family names, or who have published in multiple (and disparate) disciplines. In each of these situations, it can be difficult to tell which person is truly associated with an article, book, or other work, simply based on the author’s name.

If ORCID iDs are widely adopted, there is significant potential to use them to streamline and simplify functions such as faculty reporting, and to leverage the information available for assessment purposes such as understanding how successful Cornell graduates are in their academic careers once they leave Cornell. It’s this potential that has led us to develop a plan to promote adoption and use of ORCID iDs at Cornell.

But before we get too far along with that plan, it’s critical to understand what’s needed to get the most out of the ORCID infrastructure. Not only must individuals register for an ORCID iD, they should also add information to their ORCID record, authorize Cornell to associate their ORCID iD with their Cornell netID, and use their ORCID iD in workflows whenever it’s possible to do so. Possible Cornell applications for ORCID include using it to streamline faculty reporting processes, uniquely identifying authors in our digital repositories, and integrating it into research information systems.

Individuals can also choose to authorize other services and organizations besides Cornell (such as CrossRef, Scopus, and others) to push or pull information to/from their ORCID profile, which can greatly simplify the process of keeping their profile information current. Researchers “own” and control their ORCID iDs, and can decide whether and with whom they share information, and whether any of their information is displayed publicly.

Even before we started formulating a plan, individuals at Cornell were signing up for ORCID iDs. As of fall 2015, there were nearly 2,000 ORCID iDs with email addresses suggesting a possible Cornell affiliation, yet only two to three dozen of these had authorized Cornell to associate their ORCID iD with their netID. This is very impressive adoption for a campus with no organized outreach effort, but we’ll be unable to leverage those ORCID iDs until those iDs are connected to Cornell. This points to the very real need to ensure that adopters complete all the steps needed in order to maximize the potential benefits of ORCID at Cornell. While some institutions have opted to assign ORCID iDs to everyone, the ORCID organization no longer recommends this, and there are no plans to adopt this approach at Cornell. That means we have some work to do.

Here’s a very high level view of the plan (available in full here):

  • Fine tune the application that allows individuals to obtain an ORCID iD and connect it to Cornell (or authorize that connection for an existing ORCID iD).
  • Work with liaisons to prepare them to work directly with faculty to obtain an ORCID iD, authorize Cornell (and optionally other parties), and add information to their profiles.
  • Promote ORCID to administrators and faculty via a communication campaign and direct, in-person outreach by liaisons to faculty to walk them through the steps of obtaining the ORCID iD, populating their profiles, and getting connected to Cornell.
  • Integrate ORCID iDs with Scholars@Cornell.
  • Investigate and possibly implement support for ORCID iDs in CUL’s institutional repositories.
  • Communicate with other Cornell stakeholders about the potential for integration into campus systems.
  • Plan to sustain support for ORCID at Cornell going forward.

Sandy Payette, Gail Steinhart and Simeon Warner form a steering group to move this plan forward. Oya Rieger and Dean Krafft are the Library Executive Group sponsors. We’ll no doubt get help from many more people along the way, and CUL liaisons will be critical to the success of this effort.

In the meantime, if you find yourself discussing ORCID with faculty or staff and want some suggestions as to how to explain its value and potential, here are a couple of talking points. Registering for and using your ORCID iD will help you:

  • Get credit (for your work)
  • Get ready (for increasingly required use of your ORCID iD in processes such as manuscript submission, grant applications and reporting)
  • Get connected (to Cornell) by visiting

We’ll provide periodic updates to CUL staff as we proceed. Please don’t hesitate to contact any team member, or, with questions or comments.

Best, 0000-0002-2441-1651 (Gail Steinhart)

Digital Image Collections Consolidation in Shared Shelf

For almost a decade, we have been supporting and maintaining the LUNA Insight and ARTstor visual image delivery platforms for various types of visual resources.  LUNA was added earlier as an asset manager for images, and many of our legacy collections were until very recently housed there.  The LUNA architecture was administrated in-house.  ARTstor was added somewhat later to the CUL repository landscape, and by contrast is a hosted platform.  The strength of ARTstor is its flexibility to meet users’ specific needs, and the ease with which we can build new digital collections.

Since 2009, Cornell University Library has been in partnership with ARTstor in the development Shared Shelf.  Our involvement informs us that the newer system is more effective in meeting CUL’s increasing needs in building and supporting visual image collections as well as supporting their integration in teaching and learning. As the number of collections in ARTstor and Shared Shelf rapidly grew, we reached a tipping point that motivated us to plan for the migration of our legacy collections from LUNA to ARTstor, consolidating the collections, and streamlining our collective effort toward maintaining fewer platforms.

The migration process was exactly what we anticipated it to be – complex and challenging.  Most collections were migrated to ARTstor and Shared Shelf. The Herbert F. Johnson Museum of Art collection is migrated to eMuseum.  eMuseum is a powerful web publishing toolkit that integrates with The Museum System (TMS), the collection management software that HFJ Museum uses.

There were a myriad of details to work through in most cases, many stakeholders to coordinate. However, the migration has also has been rewarding: the migration process gave us opportunity to normalize collections for preservation purposes by organizing and archiving master files in more meaningful way.

Last week, we concluded the migration and decommissioned LUNA.  We are grateful to the Visual Resources Working Group for providing input and guidance during this project.  We would especially like to specifically thank the Jason Kovari, Danielle Mericle, Liz Muller, Hannah Marshall, Rhea Garen for their key roles in this effort.  If you have any question, please send an email to


ARTstor provides access to our restricted digital image collections. These collections are restricted to Cornell University faculty, staff, and students for educational purposes (instruction, study, research, and scholarship) only. Here is the instruction how to remotely access collections off the campus.

ARTstor provides access to our restricted digital image collections. These collections are restricted to Cornell University faculty, staff, and students for educational purposes (instruction, study, research, and scholarship) only. Here are the instructions on how to remotely access collections off the campus.



Most of our digital image collections are available via Shared Shelf Commons, a free, open-access library of images from academic and cultural institutions.


Recently Launched: CUL Digital Collections Portal

The Digital Collections Portal Team, consisting of staff from Digital Scholarship & Preservation Services (DSPS), CUL Information Technology (CULIT) and Library Technical Services (LTS), recently released the beta version of the new Digital Collections portal. The portal provides access to several digital collections, including those being migrated from DLXS to Hydra, as well as selected Cornell collections from Shared Shelf Commons. Features include faceted searching and browsing, a IIIF viewer for image zoom, a map interface for discovery of items with geolocation data, and image downloads.

The following collections are included in this beta release:


Alfredo Montalvo Bolivian Pamphlets Collection
A collection of 715 digitized pamphlets documenting a century of Bolivian literate culture, beginning in 1848. They show a nation’s struggle to establish viable institutions, to develop its economy, to educate its children and the back and forth of political argument.







Beyond the Taj: Architectural Images and Landscape Experience in South India
A collection of materials on South Asian architecture assembled over a 22 year period by Professor Robert D. “Scotty” MacDougall (1940-1987), an architect and an anthropologist. The core of this collection consists of approximately 3,000 photographs depicting significant works of architecture through time and across regional traditions throughout continental India.



Huntington Free Library Native American Collection
One of the largest collections of books and manuscripts of its kind, the Huntington collection contains extensive materials documenting the history, culture, languages, and arts of the native tribes of both North and South America. Contemporary politics and human rights issues are also important components of the collection.



John Reps Collection – Bastides
Cornell Professor Emeritus John Reps began to explore and photograph these newly founded towns of the 13th-century in 1951. This collection of images recording what he saw then and on 5 later visits document the appearance of these unusual examples of medieval urban design.




New York State Aerial Photographs
This collection presents a series of historical aerial photographs of the state of New York. It was produced under a Cornell University Library Faculty Grant to Eugenia M. Barnaba, Program Leader, Resource Inventory Cornell Institute for Resource Information Sciences.





Persuasive Maps: PJ Mode Collection
This is a collection of “persuasive” cartography: maps intended primarily to influence the opinion of the viewer — to send a message — rather than to communicate geographic information. The collection reflects a variety of persuasive tools: allegorical, satirical and pictorial mapping; selective inclusion or exclusion; unusual projections, graphics and text; and intentional deception. Maps in the collection address a wide range of messages: religious, political, military, commercial, moral and social.



Ragamala Paintings
Cornell’s Rāgamālā collection consists of some 4000 photographs Klaus Ebeling took between 1967 and 1972 as he visited museums and private collections all over the world working on Ragamala Painting. Fifty years later the slides were gifted to Cornell, thanks to musicologist Joep Bor. The Ebeling collection is among the world’s great assemblages of images in this genre. There have been numerous subsequent studies of regional traditions of rāgamālā painting — Ebeling’s collection includes them all.



This month the team is undertaking an upgrade from Fedora 3 to Fedora 4. While this will result in a brief pause in collection ingest, it will also provide a forward-looking infrastructure for all of our collections in Hydra.

Thanks and appreciation go out to the Digital Collections Portal Team:

  • John Cline (CUL-IT)
  • Jennifer Colt, co-lead (DSPS)
  • Christina Harlow (LTS)
  • George Kozak (CUL-IT)
  • Mary Beth Martini-Lyons (DSPS)
  • Michelle Paolillo (DSPS)
  • Jim Reidy (CUL-IT)
  • Adam Smith, co-lead (CUL-IT)
  • Melissa Wallace (DSPS)

Special thanks also go to Steven Folsom, Hannah Marshall and Danielle Mericle for their past and ongoing support.

We look forward to growing the number of collections accessible through the Digital Collections portal, and invite you to send any feedback or questions to Jenn Colt ( and Adam Smith (

arXiv Annual Update

arXiv started 2015 with an important milestone as we added the one-millionth paper at the end of December 2014 (press release & video). Since its inception in 1991 with a focus on the high energy physics community, arXiv has significantly expanded both its subject coverage and user base. During 2015, the repository saw 105,000 new submissions and over 139 million downloads from all over the world. arXiv has international scope, with submissions and readership from around the world, and collaborations with U.S. and foreign professional societies and other international organizations.

arXiv’s funding and governance is based on a membership program that engages libraries and research laboratories worldwide that represent the repository’s heaviest institutional users. We are pleased to report that we currently have 188 members representing 23 countries. arXiv’s sustainability plan is founded on and presents a business model for generating revenues. Cornell University Library (CUL), the Simons Foundation, and a global collective of institutional members support arXiv financially. The financial model for 2013-2017 entails three sources of revenues:

  • CUL provides a cash subsidy of $75,000 per year in support of arXiv’s operational costs. In addition, CUL makes an in-kind contribution of all indirect costs, which currently represents 37% of total operating expenses.
  • The Simons Foundation contributes $50,000 per year (is raised to $100,000 starting in 2016) in recognition of CUL’s stewardship of arXiv. In addition, the Foundation matches $300,000 per year of the funds generated through arXiv membership fees.
  • Each member institution pledges a five-year funding commitment to support arXiv. Based on institutional usage ranking, the annual fees are set in four tiers from $1,500-$3,000.

In 2015, Cornell raised approximately $372,000 through membership fees from 188 institutions and the total revenue (including CUL and Simons Foundation direct contributions) is around $815,511. We are grateful for Simons Foundation’s support. The gift has encouraged long-term community support by lowering arXiv membership fees and making participation affordable to a broader range of institutions. This model aims to ensure that the ultimate responsibility for sustaining arXiv remains with the research communities and institutions that benefit from the service most directly.

Since we started the arXiv sustainability initiative in 2010, an integral part of our work has been assessing the services, technologies, standards, and policies that constitute arXiv. Here are some of our key accomplishments from 2015 to illustrate the range of issues we have been trying to tackle. Please see the 2015 Roadmap for a fuller account of our work.

  • Evaluated the arXiv administration processes in light of evolving moderation tools and staffing needs and created and posted a new position (arXiv Operations Manager) to ensure a more productive administrative staffing configuration.
  • Reviewed the current arXiv endorsement procedures and policies across all subject categories for seeking greater uniformity and transparency.
  • Proposed and modified a new appeal process to work toward uniform policies across all subject categories.
  • Continued improving tools and interfaces to allow moderators to interact more directly and efficiently with the arXiv system and administrators based on input from the Scientific Advisory Board and moderators (to be continued in 2016).
  • Initiated a process to update, reorganize, and better document the TeX system, which is a central component of our article processing and will continue this project in 2016.
  • Added ORCID author identifier support for better interoperability with other repositories implementing authority control and also as a route toward providing institutional statistics for member organizations.
  • Began to review and refine the “stock” messages used by arXiv administrators when communicating with submitters and other arXiv users to improve their usefulness.
  • Developed a set of questions for assessing and accepting new subject domains to arXiv.
  • Piloted an online donation button to experiment with ways to expand arXiv’s revenue sources (generated $16,000 in one week).
  • Investigated interoperability requirements to enable communication/exchange between arXiv and institutional repositories.
  • Maintained worldwide network of arXiv moderators–over 150 subject experts who verify that submissions are topical and of interest to the scientific community, follow accepted standards of scholarly communication, and are classified in the appropriate subject categories.
  • Held discussions with NSF program managers to better understand how arXiv’s ongoing operations and new initiatives might best fit in to NSF programs.
  • Held an annual meeting for the Scientific Advisory Board (SAB) and Member Advisory Board (MAB) to discuss IT development priorities, financial state, moderation tools and policies, and fund raising strategies.

From the users’ perspective, arXiv continues to be a successful, prominent subject repository system serving the needs of many scientists around the world. However, under the hood, the service is facing significant pressures. The conclusion of the recent SAB and MAB annual meetings was that, in addition to the current business model with a focus on maintenance, the arXiv team needs to embark on a significant fund raising effort, pursuing grants and collaborations. We need to first create a compelling and coherent vision to be able to persuasively articulate our fund raising goals beyond the current sustainability plan that aims to support the baseline operation. We’d like to use the approaching 25th anniversary of arXiv as an important milestone to engage us in a series of vision-setting exercises. The 2016 roadmap includes our goals within the scope of the current business model. In addition, we have developed an initial arXiv review strategy to be refined and implemented during 2016.

Cornell University Library, arXiv Team

Chris Myers (Scientific Director), Oya Y. Rieger (Program Director), David Ruddy (User Support Lead), Simeon Warner (IT Lead)

Contact email:

If you are interested in getting updates from the arXiv team and have not yet signed up for the mailing list, send an email message to: Leave the subject line blank and the body of the message should be a single word: join

keep looking »