The Digital Collections Portal Team, consisting of staff from Digital Scholarship & Preservation Services (DSPS), CUL Information Technology (CULIT) and Library Technical Services (LTS), recently released the beta version of the new Digital Collections portal. The portal provides access to several digital collections, including those being migrated from DLXS to Hydra, as well as selected Cornell collections from Shared Shelf Commons. Features include faceted searching and browsing, a IIIF viewer for image zoom, a map interface for discovery of items with geolocation data, and image downloads.
The following collections are included in this beta release:
Alfredo Montalvo Bolivian Pamphlets Collection
A collection of 715 digitized pamphlets documenting a century of Bolivian literate culture, beginning in 1848. They show a nation’s struggle to establish viable institutions, to develop its economy, to educate its children and the back and forth of political argument.
Beyond the Taj: Architectural Images and Landscape Experience in South India
A collection of materials on South Asian architecture assembled over a 22 year period by Professor Robert D. “Scotty” MacDougall (1940-1987), an architect and an anthropologist. The core of this collection consists of approximately 3,000 photographs depicting significant works of architecture through time and across regional traditions throughout continental India.
Huntington Free Library Native American Collection
One of the largest collections of books and manuscripts of its kind, the Huntington collection contains extensive materials documenting the history, culture, languages, and arts of the native tribes of both North and South America. Contemporary politics and human rights issues are also important components of the collection.
John Reps Collection – Bastides
Cornell Professor Emeritus John Reps began to explore and photograph these newly founded towns of the 13th-century in 1951. This collection of images recording what he saw then and on 5 later visits document the appearance of these unusual examples of medieval urban design.
New York State Aerial Photographs
This collection presents a series of historical aerial photographs of the state of New York. It was produced under a Cornell University Library Faculty Grant to Eugenia M. Barnaba, Program Leader, Resource Inventory Cornell Institute for Resource Information Sciences.
Persuasive Maps: PJ Mode Collection
This is a collection of “persuasive” cartography: maps intended primarily to influence the opinion of the viewer — to send a message — rather than to communicate geographic information. The collection reflects a variety of persuasive tools: allegorical, satirical and pictorial mapping; selective inclusion or exclusion; unusual projections, graphics and text; and intentional deception. Maps in the collection address a wide range of messages: religious, political, military, commercial, moral and social.
Cornell’s Rāgamālā collection consists of some 4000 photographs Klaus Ebeling took between 1967 and 1972 as he visited museums and private collections all over the world working on Ragamala Painting. Fifty years later the slides were gifted to Cornell, thanks to musicologist Joep Bor. The Ebeling collection is among the world’s great assemblages of images in this genre. There have been numerous subsequent studies of regional traditions of rāgamālā painting — Ebeling’s collection includes them all.
This month the team is undertaking an upgrade from Fedora 3 to Fedora 4. While this will result in a brief pause in collection ingest, it will also provide a forward-looking infrastructure for all of our collections in Hydra.
Thanks and appreciation go out to the Digital Collections Portal Team:
- John Cline (CUL-IT)
- Jennifer Colt, co-lead (DSPS)
- Christina Harlow (LTS)
- George Kozak (CUL-IT)
- Mary Beth Martini-Lyons (DSPS)
- Michelle Paolillo (DSPS)
- Jim Reidy (CUL-IT)
- Adam Smith, co-lead (CUL-IT)
- Melissa Wallace (DSPS)
Special thanks also go to Steven Folsom, Hannah Marshall and Danielle Mericle for their past and ongoing support.
We look forward to growing the number of collections accessible through the Digital Collections portal, and invite you to send any feedback or questions to Jenn Colt (email@example.com) and Adam Smith (firstname.lastname@example.org).
arXiv started 2015 with an important milestone as we added the one-millionth paper at the end of December 2014 (press release & video). Since its inception in 1991 with a focus on the high energy physics community, arXiv has significantly expanded both its subject coverage and user base. During 2015, the repository saw 105,000 new submissions and over 139 million downloads from all over the world. arXiv has international scope, with submissions and readership from around the world, and collaborations with U.S. and foreign professional societies and other international organizations.
arXiv’s funding and governance is based on a membership program that engages libraries and research laboratories worldwide that represent the repository’s heaviest institutional users. We are pleased to report that we currently have 188 members representing 23 countries. arXiv’s sustainability plan is founded on and presents a business model for generating revenues. Cornell University Library (CUL), the Simons Foundation, and a global collective of institutional members support arXiv financially. The financial model for 2013-2017 entails three sources of revenues:
- CUL provides a cash subsidy of $75,000 per year in support of arXiv’s operational costs. In addition, CUL makes an in-kind contribution of all indirect costs, which currently represents 37% of total operating expenses.
- The Simons Foundation contributes $50,000 per year (is raised to $100,000 starting in 2016) in recognition of CUL’s stewardship of arXiv. In addition, the Foundation matches $300,000 per year of the funds generated through arXiv membership fees.
- Each member institution pledges a five-year funding commitment to support arXiv. Based on institutional usage ranking, the annual fees are set in four tiers from $1,500-$3,000.
In 2015, Cornell raised approximately $372,000 through membership fees from 188 institutions and the total revenue (including CUL and Simons Foundation direct contributions) is around $815,511. We are grateful for Simons Foundation’s support. The gift has encouraged long-term community support by lowering arXiv membership fees and making participation affordable to a broader range of institutions. This model aims to ensure that the ultimate responsibility for sustaining arXiv remains with the research communities and institutions that benefit from the service most directly.
Since we started the arXiv sustainability initiative in 2010, an integral part of our work has been assessing the services, technologies, standards, and policies that constitute arXiv. Here are some of our key accomplishments from 2015 to illustrate the range of issues we have been trying to tackle. Please see the 2015 Roadmap for a fuller account of our work.
- Evaluated the arXiv administration processes in light of evolving moderation tools and staffing needs and created and posted a new position (arXiv Operations Manager) to ensure a more productive administrative staffing configuration.
- Reviewed the current arXiv endorsement procedures and policies across all subject categories for seeking greater uniformity and transparency.
- Proposed and modified a new appeal process to work toward uniform policies across all subject categories.
- Continued improving tools and interfaces to allow moderators to interact more directly and efficiently with the arXiv system and administrators based on input from the Scientific Advisory Board and moderators (to be continued in 2016).
- Initiated a process to update, reorganize, and better document the TeX system, which is a central component of our article processing and will continue this project in 2016.
- Added ORCID author identifier support for better interoperability with other repositories implementing authority control and also as a route toward providing institutional statistics for member organizations.
- Began to review and refine the “stock” messages used by arXiv administrators when communicating with submitters and other arXiv users to improve their usefulness.
- Developed a set of questions for assessing and accepting new subject domains to arXiv.
- Piloted an online donation button to experiment with ways to expand arXiv’s revenue sources (generated $16,000 in one week).
- Investigated interoperability requirements to enable communication/exchange between arXiv and institutional repositories.
- Maintained worldwide network of arXiv moderators–over 150 subject experts who verify that submissions are topical and of interest to the scientific community, follow accepted standards of scholarly communication, and are classified in the appropriate subject categories.
- Held discussions with NSF program managers to better understand how arXiv’s ongoing operations and new initiatives might best fit in to NSF programs.
- Held an annual meeting for the Scientific Advisory Board (SAB) and Member Advisory Board (MAB) to discuss IT development priorities, financial state, moderation tools and policies, and fund raising strategies.
From the users’ perspective, arXiv continues to be a successful, prominent subject repository system serving the needs of many scientists around the world. However, under the hood, the service is facing significant pressures. The conclusion of the recent SAB and MAB annual meetings was that, in addition to the current business model with a focus on maintenance, the arXiv team needs to embark on a significant fund raising effort, pursuing grants and collaborations. We need to first create a compelling and coherent vision to be able to persuasively articulate our fund raising goals beyond the current sustainability plan that aims to support the baseline operation. We’d like to use the approaching 25th anniversary of arXiv as an important milestone to engage us in a series of vision-setting exercises. The 2016 roadmap includes our goals within the scope of the current business model. In addition, we have developed an initial arXiv review strategy to be refined and implemented during 2016.
Cornell University Library, arXiv Team
Chris Myers (Scientific Director), Oya Y. Rieger (Program Director), David Ruddy (User Support Lead), Simeon Warner (IT Lead)
Contact email: support@arXiv.org
If you are interested in getting updates from the arXiv team and have not yet signed up for the mailing list, send an email message to: arxiv-support-updates-Lemail@example.com. Leave the subject line blank and the body of the message should be a single word: join
We are pleased to invite applications for the Digital Scholarship Fellowship position. Hosted by the DSPS unit since 2012, the fellowship program aims to provide opportunities for CUL staff to expand their skills and experiences in developing, delivering, and assessing digital scholarship services. It supports the CUL objectives of “empowering staff to explore gaps in their areas of expertise” and “promoting flexible staffing among the units.” The application deadline is February 29, 2016 for fellowship terms starting during March-October 2016 timeframe.
DSPS Fellowship Ideas
Here are some examples of fellowship projects to consider:
- Recruit new content to eCommons, which could serve an important function in preserving and providing access to materials produced by Cornell’s many centers and institutes. Work in this area could include developing and documenting best practices and workflows for collecting and managing these materials, identifying candidate centers and institutes, and reaching out to work directly with centers and institutes (with CUL liaisons, when appropriate) to establish and build their collections in eCommons.
- Investigate implementation strategies for ORCID (Open Researcher and Contributor ID). ORCID identifiers uniquely identify scholars so that they can be unambiguously associated with their works, and are becoming increasingly important in ensuring interoperability of digital scholarly systems. Some research into how other institutions are approaching the assignment of ORCID ids and are integrating them into their information systems and workflows would be very helpful. Work could continue on to include identifying approaches to implementing ORCID ids at Cornell (assignment, stakeholder identification, systems integration).
- Use the Trustworthy Repositories Audit & Certification Checklist (TRAC) to evaluate the current status of eCommons policies, workflows and documentation. Help draft additional policies and documents as needed, and recommend improvements to current practices. The intent wouldn’t be full certification of the repository, rather it is to make use of relevant parts of an existing tool for a local repository audit for internal purposes.
- Design and conduct a comprehensive survey of CUL digital assets – characterizing them in terms of origin (born-digital/digital analog) aggregate size, content type, security class (sensitive information or not/rights and/or rights clearance information), and stakeholder requirements for access, discovery, etc. The intent would be to triage these towards various preservation solutions as needed based on the significant properties of the materials involved, surface gaps in our fabric of repositories, and support any appropriate recommendations.
- Sharpen your user experience (UX) assessment skills by contributing to the evaluation of CUL’s digital collection and repositories (e.g., eCommons, visual resources, etc.) to review their practical aspects such as utility, ease of use, and efficiency. How are such services and systems meeting the actual needs of our faculty and students? How do they fit in their daily work flows of research and teaching?
- Contribute to the management and dissemination of Cornell theses and dissertations. First, with oversight from the Thesis/Dissertation Advisory Group (TDAG), and in collaboration with the Graduate School, develop educational material and outreach strategies to help graduate students understand and make choices with respect to access embargoes, plans for future publications based on their thesis or dissertation, open access choices, and copyright management. Second, with oversight from the TDAG, census graduate programs at Cornell that are not administered by the Graduate School, and develop strategies and recommendations for collecting theses or other projects produced as a requirement of graduation.
- Assess the needs of CUL selectors, Library administration, and other stakeholders in CUL collection development and management for collections metrics and analytics. Collections data analysis can help to improve the quality of CUL’s collection and the alignment of collecting activity with the needs and strategic directions of the University; it can identify cost savings and inform decisions about the allocation of library resources. This project would entail determining who in the Library needs collections data to answer which questions and identifying potential sources of relevant cost, usage, and demographic data to address high-priority needs. The project would also produce recommendations for the useful analysis of collections data to support routine collection development decisions, periodic reports, internal and interinstitutional collaborations, special projects, etc.
- Develop and manage the process of reconciling the Internet Archive deposit of assets into HathiTrust. The initial pipeline for the flow of assets from Internet Archive to HathiTrust has been set up, and about 57K items have been ingested, but as is common with large scale deposits, a proportion of items need remediation to allow for deposit. Our desire is to analyze the ~20K stragglers for commonalities, classifying them to determine the best way to remove impediments to ingest, and facilitating that remediation to allow them to successfully join the deposit in HathiTrust. This project is especially suited for those with analytical and technical skill in using MS Access and MS Excel, enjoy managing projects, and enjoy working in a relatively large scale.
These are just some examples to illustrate the nature of fellowship projects. Other ideas related to the DSPS programs and goals are welcome. Information about the DSPS program is available at http://www.library.cornell.edu/DSPS
Digital Scholarship Fellows, 2012-2015
During the last six years, DSPS has been very fortunate to host seven excellent fellows, all very motivated, creative, and resourceful. We are grateful for their contributions and hope that they found the experience useful and gratifying. They are available to talk with interested parties about their fellowship experiences.
Here is a brief description of their fellowship projects and their titles and affiliations during their fellowship:
Jim DelRosso, Hospitality, Labor, and Management Library
Jim’s fellowship focused on digital repositories. His primary goal was to work with DSPS and stakeholders around CUL to craft a digital repository policy that addresses questions of software, workflow, collection development, and sustainability, while fulfilling the need for both straightforward access to and robust preservation of the items stored in CUL’s digital repositories. As a component of his fellowship, he contributed to the efforts in creating an agenda for the newly established Repository Executive Group and became the first chair. Jim’s DSPS fellowship was for one year at 0.25 FTE.
Dianne Dietrich, Physical Sciences Library, EMPSL
Dianne joined the team of our NEH-funded project on Preservation and Access for Digital Art Objects as the lead Digital Forensic Analyst. This project represented a collection-wide investigation of preservation and emulation strategies for complex born-digital media. Dianne led the project’s technical team and helped develop preservation workflows that would be a baseline for CUL digital forensics services in the years to come. As a part of her fellowship, she has been representing the project at national forums and conferences. Dianne’s fellowship was for two years at 0.5 FTE and continued as CUL’s digital forensics specialist at 0.20 FTE.
Erin Eldermire, Research and Assessment Unit
Erin’s goals for the DSPS fellowship were to contribute to the development of the library website; to explore assessment-related issues for CUL’s digital collections; and to learn from the members of the DSPS Unit towards her future career as a librarian. In her DSPS Press blog, she shared her thoughts on how the Library can enable users to employ a simple search box such as Google, while still allowing them to dive into our vast collection. Erin’s fellowship was for six months at 10 hours/week.
Steven Folsom, Library Technical Services
Steven’s primary goal was to identify and develop strategies for improving discovery and access of CUL’s content archived in HathiTrust, including outreach and community building (e.g., communication with HathiTrust and data aggregators about interoperability opportunities). He also contributed to the CUL’s efforts to migrate the DLXS-based image databases to other systems such as HathiTrust, Hydra, and SharedShelf. Included in his goals was engaging in the CUL efforts to assess Omeka/Spotlight/Drupal as platforms for creating web-based exhibits and rich-media collections. Steve’s DSPS fellowship was for one year at 0.25 FTE.
Noah Hamm, Mann Library
During his fellowship, Noah was interested in exploring how GIS and visualization techniques and tools are being used in supporting humanities research and teaching, in collaboration with the library staff interested in digital humanities programs. He also was involved in a campus-wide group to survey AV preservation needs across Cornell by conducting stakeholder interviews and gathering data about the condition and value of digital content. His fellowship term was 6-month, 12 hours/week.
Hannah Marshall, Library Technical Services
Hannah Marshall has assumed a 0.25FTE, 5-month term to coordinate the Digital Consulting and Production Services (DCAPS) during the DSPS reorg transition stage. Currently she is coordinating the DCAPS operation and works closely with the DCAPS team members to facilitate communication. She has been instrumental in coordinating the outreach process for the Arts and Sciences Grants Program. She also networks with stakeholders such as library subject specialists to make sure that there is sufficient user input to support the development efforts.
Gail Steinhart, Mann Library
As the first DSPS fellow, over the course of her one year fellowship with DSPS (2012-2013. 0.5FTE), she chaired a newly formed group to address issues related to the management of Cornell’s electronic theses and dissertations (ETDs), including facilitating discussions with the Graduate School, which led to a revised set of embargo options that will be implemented when upgrades are made to the online submission tool used by graduate students to submit their theses ETDs. She reviewed and reported on the results of a pilot project examining the use of Johns Hopkins’ Data Conservancy to host data sets associated with papers uploaded to arXiv, led the production of a white paper examining current approaches to digital repositories within CUL, and contributed to other DSPS efforts such as educating librarians on current issues in scholarly communication (with particular emphasis on research data management and sharing). Finally, she led the development of a collaborative grant proposal to the Institute for Museum and Library Services with the University of Wisconsin-Madison, Columbia University and CalPoly, to develop and share a set of best practices for collecting, documenting and disseminating the research data of faculty nearing retirement.
For More Information About the Program:
- Interested CUL staff members are encouraged to discuss the fellowship position with their supervisors first.
- If you have questions regarding the HR arrangements and funding please contact Lyndsi Prignon at <firstname.lastname@example.org>.
- Issues related to the program areas, potential projects, and the scope of the fellowship should be addressed to Oya Rieger <email@example.com>.
- Oya Rieger and Lyndsi Prignon will be glad to talk with interested staff and their supervisors about logistical details such as making back-up arrangements and ways to accommodate the candidates’ existing responsibilities and goals.
- We will have 2-3 positions open to CUL staff with a term of 6-12 months at a part-time capacity (0.25 FTE).
- Although there are no prerequisite skills required, the candidates need to be familiar with the recent trends and practices in one of the digital scholarship program areas (e.g., repositories, publishing, research data, digital collections, digital preservation, preservation policies, etc.).
- To apply, send a copy of your CV to firstname.lastname@example.org with a cover letter describing the program areas of interest and expectations from the fellowship.
- The applications will be reviewed by a small committee with input from the candidate’s supervisor.
- The application deadline is February 29, 2016 for fellowship terms starting during March-October 2016 timeframe.
Oya Y. Rieger, January 2016
In 2013, Cornell University Library received a research and development grant from the National Endowment for the Humanities to design a framework for preserving access to digital art objects. The Preservation and Access Frameworks for Digital Art Objects project (PAFDAO) was undertaken in collaboration with Cornell University’s Society for the Humanities and the Rose Goldsen Archive of New Media Art, a collection of media artworks housed in the Library’s Division of Rare and Manuscript Collections. This collection of complex interactive born-digital artworks is used by students, faculty, and artists from various disciplines. Despite its “new” label, new media art has a rich 40-year history, making obsolescence and loss of cultural history an imminent risk. As a range of new media are integrated in art works, these creative objects are becoming increasingly complex and vulnerable due to dependence on many technical and contextual factors. The phrase new media art denotes a range of creative works that are influenced or enabled by technological affordances. The term also signifies a departure from traditional visual arts (e.g., paintings, drawings, sculpture, etc.) and often the interactive nature of works.
We are pleased to announce that the Preserving and Emulating Digital Art Objects White Paper describing the project’s findings, discoveries, and challenges is now available. The ultimate goal of the project team has been the creation of a preservation and access practice grounded in thorough and practical understanding of the characteristics of digital objects and their access requirements, seen from the perspectives of collection curators and users alike. Equally important has been the establishment of service frameworks and policies that are sustainable, realistic, and cost-efficient. So all through the project, one of our principles has been moving the experience gained through research into practice. Although the initiative focused on new media art, we hope that our methodologies and findings will inform other types of complex born-digital collections as well.
Throughout our project, a reoccurring theme in our findings involved the difficulties associated with capturing sufficient information about a digital art object to enable an authentic user experience. We have concluded that the key to digital media preservation is variability, not fixity. The trick is finding ways to capture the experience—or a modest proxy of it—so that future generations will get a glimpse of how early digital artworks were created, experienced, and interpreted. So much of new media works’ cultural meaning derives from users’ spontaneous and contextual interactions with the art objects. Providing appropriate cultural and historical contexts for understanding and interpreting new media art is part of each institution’s individual mission, but also a matter of collective importance, given the rarity of such collections, the numerous challenges of establishing preservation protocols, and the overall scarcity of resources.
During the last few years, we have witnessed several trends and advancements, for instance, the increasing prominence of video and web art. The organizational, technological, and financial challenges associated with preserving and providing access to web-based resources is quite astounding. So as we develop strategies for CD-ROM-based content, we are mindful that there are more significant challenges ahead. As described in the white paper, although emulation was not included in the original project plan, it emerged as a viable strategy. Institutional experiences and perspectives on emulation will not scale unless there are communities of emulation involving archivists and curators. We need to explore how cultural institutions can interface with groups involved in emulation, which currently is driven by games communities and hobbyists.
We would like to emphasize that, as artists have increasing access to ubiquitous tools and methodologies for creating complex art exhibits and objects, we should expect to see an increasing flow of such creative works to archives, museums, and libraries. It is nearly impossible to preserve these works through generations of technology and context changes. Therefore, diligent curation practices are going to be more essential than ever in order to identify unique or exemplary works, project future use scenarios, assess obsolesce and loss risks, and implement cost-efficient strategies. Also, we would like to emphasize that access is the keystone of preservation. The preservation of digital art objects needs to be conceptualized, motivated, informed, and energized by present and future use.
PAFDAO Project Team
Note: An interview with the PAFDAO team about their efforts is available on The Signal, the Library of Congress’s digital preservation blog. The post is titled Authenticity Amidst Change: The Preservation and Access Framework for Digital Art Objects.
Launched in spring 2015, Persuasive Cartography: The PJ Mode Collection is a collection of “persuasive maps” — maps intended to influence beliefs or communicate opinions, rather than convey solely geographic information. The 310 maps were donated to the Division of Rare and Manuscript Collections by alum PJ Mode ‘60, and were digitized by Digital Consulting and Production Services (DCAPS) and made available online through the Persuasive Cartography website.
This impressive collection has already gotten a variety of press, including articles in Ezra Update and National Geographic. In the coming months, DCAPS plans to add additional items to the collection, and also make it available within a new Digital Collections website (December 2015), which will allow for cross-collection searching and greater discovery. Please check out the collection and let us know what you think.
As part of an ongoing effort to better promote CUL’s digital collections, we have embarked on an initiative to strategically link newly digitized content to appropriate entries within Wikipedia. Going forward, this will be integrated into our overall digital project workflow, to guarantee broader points of access to our collections. Wikipedia is ideologically aligned with the library vision for free access to knowledge (see slide 6) and consistently seeks reliable sources. Libraries and archives are excellent locations to find reliable sources. Wikipedia, the 7th most popular website on the internet with over 500 million visitors per month, is one of the best online spaces to make our resources visible.
Many examples in this direction come from the GLAM project (the acronym for Galleries, Libraries, Archives, and Museums) and the Wikipedia Library Project. Both are projects that Wikipedia runs in collaboration with many cultural Institutions (see more cases here and here).
We are pleased to announce the following projects have been funded for the 2015 Arts & Sciences Grants Program for Digital Collections. These initiatives will expand our digital primary material collections for research and teaching. They will also contribute to the burgeoning field of scholarship in the digital humanities through the use of innovative digital methodologies.
The grants program aims to support collaborative and creative use of resources through the creation of digital content of enduring value to the Cornell community and scholarship at large. The program is funded by the College of Arts of Sciences and coordinated by Cornell University Library (CUL). The Arts & Sciences Visual Resources Advisory Group oversees the visual resources program and CUL’s Digital Consulting and Production Services (DCAPS) plans and implements the grant-funded projects.
Grants Program for Digital Collections in Arts and Sciences
Projects Selected for Support in 2015
Cornell Costume and Textile Collection
Judith Byfield, Department of History; Denise Nicole Green, Fiber Science & Apparel Design; Jolene Rickard, Department of the History of Art and Visual Studies & Director, American Indian Program
The Cornell Costume and Textile Collection (CCTC) includes over 10,000 items of apparel, flat textiles, and accessories dating from the late 18th century. The goal is to make the CCTC discoverable and broadly accessible by faculty, students, and researchers and encourage the use of this extensive collection in teaching and research. Currently, the collection attracts visiting scholars and is considered a hidden Cornell gem. Textiles are studied from many different disciplinary perspectives: materials science, chemistry, social sciences, cultural studies, art and design, to name a few. We expect that the richness and depth of collections and its broad online availability will provide ample opportunities for collaborations among social scientists, humanities scholars, and physical scientists.
Sterrett Photographs Collection
Benjamin Anderson, History of Art and Visual Studies
The goal is creating a digital repository for the Sterrett Photographs collection, which documents major archaeological monuments in present-day Greece, Turkey, Cyprus, Syria, and Iraq. The collection constitutes a major documentary resource for the study of archaeological sites that have been substantially altered by subsequent restorations and developments, or that have been very recently destroyed or face a real threat of destruction. Given the ongoing demolition of archaeological heritage in the regions covered by the Sterrett Photographs, we anticipate scholarly attention for this collection to increase in coming years. This is an important collection for training archaeology and architectural history student in learning how to work with historical photographs as a primary form of evidence.
Lindsay Cooper Archive
Benjamin Piekut, Music
The Lindsay Cooper Archive includes the musical scores, sketches, and manuscripts of Lindsay Cooper’s work, spanning over 30 years of her career. Although much of her work has been issued (and reissued) on recordings, the scores and manuscripts have never been available to researchers. Currently in a storage locker in North London, the physical materials will stay in the UK, where she is well known, and will ultimately be housed at the University of the Arts London. Therefore, a partnership between Cornell and the University of the Arts London plans to make these scores and archival recordings available digitally. In addition, pending permission from her estate, we will make her manuscripts and print matter available to anybody on the web. Archival resources for this kind of avant-garde are rare, indeed, so this digital collection would be quite important, especially because there is a real paucity of women’s stories in the history of experimental and improvised music.
On Our Backs
Kate McCullough, English & Feminist, Gender, & Sexuality Studies Program
This project aims to make available the full content of On Our Backs, which is a ground-breaking and historically important publication used by students and researchers in the visual, political, historical, and gender and sexuality fields, attracting both social scientists and arts and humanities scholars. Creating an online version of On Our Backs will enable scholars and students here and beyond Cornell to access the material independently of a visit to the Rare and Manuscript Collections’ supervised reading room. This will greatly improve and expand access for our local users as well as reaching a global community. The photography, artworks, essays, advertising, and various non-fiction pieces throughout this magazine are valuable historical resources and material for academic reflection.
We are grateful for the contributions of Tre Berney, Bonna Boettcher, Mickey Casad, Rhea Garen, Peter Hirtle, Jason Kovari, Hannah Marshall, Brenda Marston, Danielle Mericle, Katherine Reagan, Jim Reidy, and Melissa Wallace as they collaborated with faculty in preparing the proposals.
2014-15 was a record year to date for Cornell research made freely accessible to the world with support of the Cornell Open Access Publication (COAP) fund. COAP supports the choice of Cornell authors (faculty, staff, and students) to publish in peer-reviewed, open access journals by underwriting “reasonable article processing fees” associated with OA publishing – fees that typically range from $1,300 to $2,500 per article. Administered by CUL, the COAP fund was established in the fall of 2009 with funds from the Library and the Office of the Provost.
The COAP program awarded funds to support open access publication of 39 articles by 35 Cornell authors over the 2014-15 fiscal year – more than double the number of articles reimbursed in the previous year. We can only speculate about the grounds for the dramatic increase. Awareness of the available funding has no doubt spread on campus, via Library outreach and word-of-mouth among researchers. But growing acceptance of open access publishing in the academy is surely playing a role as well. Since the fund was established in 2009, COAP has reimbursed costs for a total of 85 articles by 65 authors since from 34 different Cornell departments and programs. The articles appear in 38 different open access journals from a total of 20 publishers, both not-for-profit and commercial. The journals published by PLOS, “a nonprofit publisher and advocacy organization founded to accelerate progress in science and medicine,” have by far the strongest representation, followed by the BioMed Central journals, which are owned by the commercial publisher Springer Science+Business Media.
The COAP fund is Cornell’s implementation of principles laid out in the multi-institutional Compact for Open-Access Publishing Equity (COPE) initiative, which began at Harvard in 2009. Cornell was among the original signatories to the Compact, which recognizes both the benefits of open access and the continuing value of scholarly publishers and emphasizes the need for sustainable funding for publishers that make the content in their journals openly accessible. Each COPE signatory institution pledged to establish “durable mechanisms for underwriting reasonable publication charges for articles written by its faculty and published in fee-based open-access journals and for which other institutions would not be expected to provide funds.”
Cornell’s COAP fund is intended as a funding source of last resort and does not reimburse authors for article processing fees if other appropriate sources are available (e.g., some research grants allow use of grant funds for fees to publish results in open access venues). The COAP fund covers article processing fees only for articles published in “pure” open access journals, that is, publications in which all articles are open. Articles in “hybrid” open access journals – journals in which authors can opt to pay to make their individual articles freely accessible while the rest of the content remains behind a pay wall – are not reimbursable with COAP funds (see full COAP criteria). The COAP fund is jointly supported by CUL and Cornell’s Provost. The Library contribution, which comes from the collections budget, has been $25,000 per year.
– Kizer Walker, Director of Collections, CUL
The HathiTrust Research Center (HTRC) held its third annual UnCamp over March 30 and 31. The HTRC has continued to demonstrate its commitment to the evolution of tools of tools and functionality of interest to scholars in the digital humanities, this year adding two additional tools to the environments, algorithms and datasets that it already offers. The main themes emerging throughout the conference, as evidenced in both the announcements of developments and through various conversations, were related to sustainability of the HTRC to be a continued presence in the scholarly sphere, and the adoption of the computational environment by more scholars in their work. The latter topic was both demonstrated throughout various presentations of scholars work, and engaged through discussion of ways in which the HTRC could improve its offerings to scholars by lowering barriers and enhancing adoption.
The HTRC Bookworm is a tool very like the Google nGram Viewer but it leverages the HTRC indices instead of the Google data set as the data to be analyzed. Given the similarities between the two, it is not surprising that the same genius was at work in the development of both tools (Erez Lieberman Aiden and Jean Michel Baptiste, among others). The resulting Bookworm is available as an open source project (code available.) The HTRC’s implementation of Bookworm is more graphically oriented allowing from a fairly complex set of constraints to be set on the data through an intuitive and clean visual interface. Currently, it is limited to unigrams. There is a lot of interest in the conference to allow the HTRC Bookworm to be constrained to a collections of one’s own making, so it is likely that development of this feature will begin in the year to come.
The Data Capsule (located in the “portal” alongside the basic algorithms) also made its debut, along with a supporting tutorial. The data capsule is a virtual machine environment that a scholar can customize with various tools, lock, and then use to address the texts of the HTRC computationally. The scholar can then unlock and retrieve the results of the computation. Overall, this presents an environment that supports scholars using computational methods and also prevents reconstruction of the corpus or any given work in it. Currently, analysis is restricted to books in full-view, but the Data Capsule is engineered specifically for security that will allow scholar’s access to computationally address works in limited view as well. Legal machinery is already underway to pave the way for this important evolution.
Sustaining HTRC services
The HTRC has been spending considerably effort to grow from an experimental pilot to a reliable service. Although there is an open acknowledgement that this road will be a long one, there are some welcome developments that are good steps in this direction. HTRC has hired Dirk Herr-Hoyman (Indiana University – Bloomington) as Operations Manager, who is charge with bringing greater rigor to the HTRC service offerings (clearer versioning, increased security, responsive user help, clear development roadmaps, regular and documented/announced refreshes of primary data, etc.) Administrative development of an MOU between the HTRC and HathiTrust is another important step, leading to greater clarity of their separateness and relationship, and the roles and responsibilities of each in terms of data management and security.
Adoption of the HTRC services by scholars using computational methods was a central concern. Instance of use is a measure of relevance, and programs of greater relevance make a better business case to be worthy of our efforts. Throughout the conference, discussions of what might be needed to facilitate adoption of the HTRC service offerings by scholars elicited many concrete steps that could help:
- User testing/UX of portal. Scholars and especially librarians feel that the user experience could benefit from some redesign. Currently there are at least four base URLs that lead to various tools and user experiences and documentation, and none of these have the same branding or interface design, leading to confusion on the part of scholars as to what HTRC is and what it is offering. It is equally unclear where to go for help on each tool, where to post questions, how to ask for features or development when one’s project is out of the current scope of a tool and requires some extension. There was some talk about rebranding the HTRC services as Secure HathiTrust Analytical Research Commons (SHARC) and placing all service offerings under that umbrella, as well as some immediate and longer term steps that might be taken to move things in a positive direction.
- Discussions also revealed the need to strengthen the link between developments at Google and how they affect HathiTrust and HTRC. Google improves their books periodically, developing image corrections and improved OCR at scale. (In fact, right now, they have recently released a new OCR engine, and are re-processing many books that have the long s, including fraktur. Early results look very promising.) These improved volumes are reingested into HathiTrust. These improvements can be better leveraged by:
- Improved communication between HathiTrust and HTRC on improvements underway. The Google Quality Working Group might be a good place to coordinate some of this information. If these updates are systematically conveyed to HTRC, they will direct precious resources into other efforts.
- Updates of the data HTRC receives from HathiTrust that are coordinated with major releases of material re-processed by Google.
- Effort should be directed to relationships. The suggestions in this conversation were that in the short-term, HTRC might supply more advising on grants and the grant process that would leverage HTRC services. In the mid- to long-term, HTRC might seek international partnerships and relationships. Also in the mid- to long-term, HTRC might leverage librarians and scholars as ambassadors to professional societies to raise awareness.
There were two keynotes that described scholarly projects and nine projects touched upon in lightning rounds.
Michelle Alexopoulos, “Off the Books: What library collections can tell us about technological innovation…” Michelle shared her perspective as an economist working with HTRC data to discover patterns of the time between the invention of a technology and its adoption, and describe the economic impacts, as well as the ways in which a specific technology might impact other technological developments. Her project employed algorithmic selection of a large corpus based on MARC attributes, and Bookworm/nGram data.
Erez Lieberman Aiden, “The once and future past” Erez was on the original team of people who created the nGram viewer code and coined the term Culturomics to describe the intersection of patterns they were seeing between culture and trends revealed through algorithmic analysis of texts. He recapped the scholarly impact of the nGram viewer, its open source successor (Bookworm), and the provocative notion that we can use this data predictively as well as retrospectively.
Lightning rounds included Natalie Houston and Neal Audenaert’s “VisualPage: A Prototype Framework for Understanding Visually Constructed Meaning in Books”. Natalie also visited Cornell to present and discuss about her work on 4/16 as a part of the Conversations in the Digital Humanities series.
Michele Hamill of CUL’s fantastic Conservation Department asked me to guest blog about audiovisual preservation as part of Preservation Week 2015. Pardon the cross-posting, but I thought I’d share it here on the DSPS Press blog as well. Wishing you a happy birthday and a wonderful Charter Day, Cornell University, as well as a wonderful weekend to you all.
First of all, I’m honored to be a guest on our Library’s Conservation Department blog, as they are a great team doing magical things. When discussing audiovisual preservation and the big issues facing possible catastrophic loss of materials on magnetic media, proper conservation becomes even more important as we chart out solutions that may emerge from our campus-wide AV Preservation Initiative.
Both UNESCO’s Blue Ribbon Task Force publication (Sustainable Economics for a Digital Planet, 2010) and the Library Of Congress are estimating that the vast majority of materials housed on magnetic tapes (cassettes, open-reel audiotape, VHS, etc.) will be lost in the next 10 years due to degradation and playback obsolescence. This includes materials ranging from field recordings of cultural events in dying languages to your own home movies of grandparents or children.
Cornell University Library’s Collection Development Executive Committee has set up a preservation fund (allocated through a grant-based system) awarded to save fragile, unique, and heavily used collections and, due to issues with legacy AV content, a lot of that fund has gone to digitization of AV collections. As an example, I’m currently working on digitizing a large collection of VHS tapes for the Africana Library of unique lectures given at Cornell in the past. Last year, this collection was moved to the annex, as they are the only copies in existence and are no longer in circulation.
While preservation and digitization is key to older formats, it’s also incredibly challenging for digital formats as well. Digital content, while often easier to use and access in a lot of cases, is incredibly fragile and subject to many problems such as bit rot and errors, proprietary and complex formats and file types, and costly storage. In reality the world is creating digital content at a staggering pace, resulting in petabytes of possibly important or disposable content. How do we deal with this in our work or even in our personal collections of video or photos?
The Library of Congress has provided a thorough resource for individuals to get a handle on the digital content they are creating, as well as digitizing to share with family and friends across the globe. This is a rapidly increasing need of people everywhere, but how do we decide what do we keep and how much? Witness.org stands out as a good example of an organization that is also promoting a more curatorial culture for our content at large, and for a purpose. They provide a guide to archiving content from a journalism/activist perspective, from creation to preservation and access.
Working in a memory institution, I often feel like I’m helping usher content from the past into the future and that is a tremendously gratifying feeling. ‘This work will outlive us,’ is something I often hear said in libraries and archives and while that is true, there is a huge amount of effort and a lot of tough decisions that go into conservation, preservation, and access. Whether it’s a beautiful tome from the 17th century or video of one of the last known public appearances of Jimmy Hoffa, it takes detailed work, resources, and careful planning to keep these things alive. In reality, history is written by every one of us. What’s your story?keep looking »