In late October, we had the opportunity to visit members of Indiana University’s Media Preservation Task Force with two colleagues from Cornell’s Lab of Ornithology, Karl Fitzke (Audio Engineer at Macaulay Library), and Bill McQuay, (Supervising Audio Engineer, Macaulay Library). Our goal was to learn more about IU’s ambitious preservation plan to digitize its AV holdings comprehensively. Also in attendance was Adam Tovell, Preservation Officer for Sound and Moving Image Collections at the British Library, who is on the brink of developing a similar initiative for the BL. As it happens, our visit was very well-timed, as just a week prior the President of IU had announced a $15 million/5-year initiative to fund digital preservation for the entirety of the IU AV collections, campus-wide: http://news.iu.edu/releases/iu/university-wide/2013/10/state-of-university-2013.shtml
Over the course of two jam-packed days, we met with an impressive range of staff and departments collaborating closely to move the AV initiative forward. It was inspiring to get a first-hand look into their almost-decade-long effort to bring Indiana University into the forefront of AV preservation. Everything was carefully considered, thoroughly researched, and brilliantly executed. They have paved the way in setting international audio preservation standards (http://www.dlib.indiana.edu/projects/sounddirections/papersPresent/index.shtml – with Harvard University); developed tools to assess risk, uniqueness, and digitization workflows; and created a major new access platform with Northwestern University and Audio Visual Preservation Solutions called Avalon – https://wiki.dlib.indiana.edu/display/VarVideo/Avalon+Media+System .
Our visit included an overview of the history of the AV initiative, including administrative strategies, technical challenges, and approaches to data gathering. One of IU’s first steps in the process was to conduct what they called a “census” of all key stake-holders on campus, to determine the scope of the problem (number of items/formats; condition of materials; available metadata, etc). This was an effort conducted in person without the use of web surveys, and resulted in excellent data with which to plan the larger effort (some of the spreadsheets are dizzying). It was also a key part of relation-building to garner support for the initiative.
An early advocate of the effort included the Vice Provost for Research, who was the former Director of the Archive of Traditional Music (ATM), an incredibly rich repository of audiovisual material that documents music and culture from around the world. Not surprisingly, Indiana’s initiative originated from the ATM due to the large, rare, and unique collections of legacy media it contains. When faced with the possibility of losing holdings over time due to degradation and signal carrier (ie. playback equipment) obsolescence, they realized that something needed to happen sooner rather than later. Our discussions again confirmed the necessity of institutions to have a plan for this material in place or risk losing access to it forever. Mike Casey, head of the IU initiative, explained that most legacy AV material will not be accessible in 10-15 years, and tackling migration to digital formats gets more expensive by the day.
Another very impressive part of their effort was their institutional IT support. Working server space to support the digitization of the material is estimated to be roughly 200 terabytes. The estimated final total needed to archive the digitized material is around 13 petabytes, not including film. When they do include film content, the estimated total will be 49 petabytes. Film is widely considered to be more stable than magnetic media, so they’re approaching that issue separately, due to their vast, unique holdings and ideal storage conditions (which includes almost 3,000 titles in frozen storage). Astonishingly, the IT group appears unfazed by these numbers, with petabytes of storage already available for use.
In short, it was an inspiring visit, and we are ever-grateful for our generous hosts. It was also well-timed from Cornell’s perspective, as the campus-wide CU AV Preservation Group just got the green light from Anne Kenney, Ted Dodds, and Mike Webster to begin an exploratory pilot to determine the scope of the challenges here at Cornell. We will be unveiling a project plan in January, and are hoping for campus-wide engagement on this very challenging task. Find more information here:
Danielle & Tre
Visual resources are critical in enabling and enhancing learning and teaching in the humanities and arts. During the last two decades, we have witnessed the digital shift in several ways, including the replacement of slide collections with personal digital image collections and an increased reliance on shared online visual resources such as ARTstor.
And although online collections such as ARTstor make a broad range of digital images accessible, no comprehensive collections currently exist in any subject area. Digital visual resources are weak especially in non-Western and non-traditional cultures such as Native American Indian, Islamic, or African American studies. Creating visual resources to support a rich and diverse domain of scholarly explorations continues to be the joint responsibility of cultural institutions and research libraries.
So, how can the Library help?
Since 2010, we’ve collaborated with the College of Arts and Sciences in a digitization program to respond to the challenges inherent in the move from analog to digital delivery of image resources. Our digitization program also helps foster the integration of new media into teaching to enhance learning and creative expression and supports the creation, management, sharing, and archiving of high-quality images, bearing in mind the importance of both pedagogical and sustainability issues.
The most recent round of grants includes several fascinating projects, including:
- photographing and digitizing a slide collection of Indian Raga Mala paintings;
- digitizing and archiving a collection of fragile videotapes that are essential for teaching the history and theory of digital art;
- creating a digital repository of the A.D. White Collection of over 2,000 plaster casts and impressions of engraved gems and amulets from Classical antiquity; and
- collection of important and fragile squeezes (paper impressions) that were created in Ankara (Turkey) during the Cornell Expedition to the Assyro-Babylonian Orient in 1907.
A key component of the collaboration is the Grants Program for Digital Collections in Arts and Sciences, which aims to support collaborative and creative use of resources through the creation of digital content of enduring value to the Cornell community and scholarship at large.
The Arts and Sciences Visual Resources Advisory Group, which is composed of faculty members and representatives from the Library, oversees and continues to refine the service model, including the coordination of the grants program. In addition to the grants program, the initiative also supports the digitization of visual content used in courses to develop a sharable and sustainable curriculum library to support the College’s teaching mission.
And a few of our most successful grants from the past:
Hip Hop Collection/Conzo Archive
Steve Pond, Music and Travis Gosa, Africana
Collaborator: Katherine Reagan, Cornell University Library
Founded in 2007, Cornell’s hip hop collection is now the largest archive on early hip hop culture in the United States. A key foundational element of the collection is an assemblage of photographic prints by Bronx photographer Joe Conzo, Jr., taken between 1977 and 1984. Conzo is one of the few photographers known to have captured the early years of hip hop on film. Online access to the collection is of interest to multiple disciplines, including art, art history, dance, music, American Studies, Africana and offers a rich array of learning and teaching materials for a new Cornell course on hip hop.
Warburg’s “Atlas” Panels
Peter Uwe Hohendahl, German Studies, Comparative Literature
Collaborators: Kizer Walker, Cornell University Library; Peter J. Potter, Cornell University Press; Christopher D. Johnson, Comparative Literature, Harvard University
The goal of the project is to build an interactive resource for the exploration of the fragmentary “atlas of images” left by German Jewish art historian Aby M. Warburg (1866-1929). The Atlas involves the assemblage of hundreds of images juxtaposed on wood panels. An interactive, web-based treatment of the Atlas will realize Warburg’s ideal, namely, that each viewer makes his or her own connections between the myriad images presented in the Atlas. This website serves as a multimedia companion to “Signale: Modern German Letters, Cultures, and Thought” and will support exploration of new technologies and new partnerships in creating economically viable channels for disseminating scholarship.
Japanese Woodblocks from the William Elliot Griffis Collection
Katsuya Hirano, History/Asian Studies
Collaborator: Daniel McKee, Cornell University Library
These 17th century Japanese woodblock printed books represent Japan’s initial attempts to understand the west and modernize itself. They are therefore of great importance in understanding the formation of modern Japan. These books, many of which are rare or even unique in US collections, have great appeal to historians, art historians, and scholars of cultural politics.
David Bathrick, German Studies/Theater
Collaborators: Dr. Rainer Stollmann, University of Bremen (Germany); University of Bremen Library; Dr. Michael Jennings, Princeton University
We significantly expanded the existing Muller-Kluge online collection, which is one of the most visited collections hosted by the Library. The website consists of interviews between West German filmmaker Alexander Kluge and the East German playwright Heiner Muller. The new site will will incorporate Kluge interviews with Hans Magnus Enzenberger and Oskar Negt. This initiative also involves a partnership and will enable Cornell to have access to Princeton’s Kluge Research Collection.
Cornell Gem and Amulet Collection
Caitlín Barrett and Verity Platt, Classics/Art History
The project involves the creation of a digital repository of the A.D. White Collection of over 2000 plaster casts and impressions of engraved gems and amulets from Classical antiquity. These casts have been used for teaching more than a dozen lecture courses and seminars. Digitization of Cornell’s gem collection is a natural continuation of projects spearheaded by Professor Alexandridis and Danielle Mericle, who have been digitizing and cataloging the university’s casts of ancient sculptures and its collection of Greco- Roman coins (funded by Arts & Sciences Grants). Digitization of the collection will make it much easier to assign research projects on the material to students, creating a fantastic classroom resource to use alongside the objects themselves.For more information about the grant program, please visit the DCAPS website.
Responsive design is a web design approach that aims to create an optimal viewing experience across a range of devices — from large desktop screens and smaller laptops, to tablets and smartphones. Responsive design uses CSS media queries to craft web pages that will respond to a browser’s size, or viewport. Whereas previous web development strategies focused on creating separate desktop and mobile sites — often with redundant content and separate sets of code — in responsive design, the end product is one website that is device-independent, ensuring the best user experience for all users.
Here at CUL, all new websites are being developed responsively, typically using the Bootstrap framework. Most of our recent redesigns have been responsive as well. Responsive sites that are currently in production at CUL include:
Using a framework like Bootstrap has allowed us to speed up the process for developing responsive sites, although it is worth mentioning that the time it takes to design and develop even a small web site has markedly increased now that we are designing for different devices. Gone are the days where you created one design at a standard screen resolution of 800x600px, tested it in IE and Firefox, and called it a day.
Check out the following websites for more information about responsive design, and to see many examples of responsive websites:
- Responsive Web Design
- Responsive Web Design Guidelines and Tutorials
- How to use CSS3 to create a mobile version of your web site
- Media Queries (a showcase of responsive sites)
- Responsive Design Sites: Higher Ed, Libraries, Notables
- Edustyle: sites tagged ‘responsive’
The 8 million print volumes in the Cornell University Library are a rich resource for students, faculty, and staff. However, for Cornellians with print disabilities such as visual impairment, learning disabilities, or other disabilities, using printed resources can be difficult. Thanks to an innovative program of the HathiTrust, many of these volumes can be made available to eligible users in an electronic form that makes reading easier. That program is called “enhanced access.”
The HathiTrust Digital Library is a partnership of research libraries to preserve and provide access to their digitized holdings, currently numbering 10.8 million volumes. About 32% (3.4 million volumes) are in the public domain and can be downloaded by any Cornell user. HathiTrust also has over 7 million copyrighted volumes whose content can be searched but that cannot be read online: one has to borrow the library’s print copy to read the work.
The Enhanced Access program provides Cornell patrons with certified print disabilities access to the digital copies of in-copyright books. A Cornell-designated proxy can download and give digital copies of books that Cornell owns in print to qualified Cornell patrons. Patrons with print disabilities can find details of how to use this program’s services on the Cornell University Library’s Disability Services page. In brief, students will need to certify with Student Disabilities Services (SDS), and faculty and staff will need to certify with Medical Leaves Administration (MLA). A patron guide with full instructions is available at the URL above in PDF format, and guidance from proxies can be always obtained through the email address for this purpose.
Users should be aware that the works which they acquire through enhanced access are still protected by copyright. The basis of this newly-broadened access is a recent legal decision that providing access to electronic versions of books to users with print disabilities is not an infringement of copyright. Copies made by this service are for personal use only and must not be shared with anyone else or copied beyond what is needed to facilitate personal use.
Services like this are broad collaborations. I would like to thank the many people that have joined together over the past summer to make this service a reality: Kappy Fahey and Cyrus Hamilton of SDS; Carol Nickerson and Patti Riddle of MLA; Andrea Haenlin-Mott, Cornell’s ADA Facilities Coordinator; Laurel Parker of the Office of Diversity and Inclusion; Pat McClary of University Counsel’s Office; Peter Bosanko and Joy Veronneau of Identity Management, CIT; Tobi Hines, Peter Hirtle, Peter Magnus, Michelle Nair, and Bethany Silfer of CUL. On behalf of DSPS – and of the patrons that will be assisted through this service that you have helped shape – thank you for your efforts and wise counsel.
Questions about the new service may be directed to either CUL-HTProxy-L@cornell.edu or to Michelle Paolillo, the Library’s HathiTrust Coordinator.
HathiTrust currently contains about 10.8 million volumes. Approximately 32% of these volumes are in the public domain. (HathiTrust provides a variety of contemporary snapshots of its holdings.) The HathiTrust Research Center (HTRC) is an independent but associated entity that currently enables computational access for nonprofit and educational users to published works in the public domain. HTRC also has plans for similar access to works that are in-copyright from the HathiTrust on limited terms, possibly through a virtual machine for authorized scholars.
On 9/8-9/2013, scholars, librarians, project managers and information technologists converged on the University of Illinois iHotel for HTRC UnCamp2013. I came for a variety of reasons: to represent Cornell, to learn about future plans of the HTRC, to gather examples of projects in the Digital Humanities that use computational approaches, and to network with colleagues on a variety of side-issues related to ingest, quality metadata and all things HathiTrust. Although this is only the second year of this conference, it is easy to note many improvements over the inaugural year. Programming was tighter, making much better use of our time. The conversation appeared more open and more driven by the needs of the participants. We discussed not just the tools and what they did, but where they might be a good fit, and where they might not be, and perhaps most importantly, what adjustments need to be made to increase usefulness, usability and transparency into what the tools do. Presentations by scholars of their computationally-based humanities projects abounded, both for those that used the HTRC tool set and those that used non-HTRC tools, occupying two lightning rounds, and two keynotes.
As you might imagine, this stimulating environment affords many lessons, and it is difficult to select a mere few. I’ll try to summarize the broadly emerging themes:
- The tools are already providing scholars the means to credibly re-test traditional assumptions of their fields. Close readers of a subject can develop an intuitive sense of trends related to their interests, but computational access can test these assertions, with actual metrics. After all, who can claim to read all of Victorian poetry? A close reader might spend a lifetime doing this. Computationally, this can be accomplished through distant reading by a small team of people with specific technical and scholarly expertise. Computational approaches of inquiry sometimes confirm traditional assumptions, but just as often seem to provocatively re-open issues for discussion, moving the conversation beyond conventional wisdom.
- Move beyond the bag of words. Digital books consist, for the most part, of page images and associated Optical Character Recognition (OCR). OCR provides the text flow that is mined when computational tools are used. But OCR is often not structured in any meaningful way; it doesn’t contain information about paragraphs, or line length. But a book is not merely a “bag of words”, and deep understanding of any text-based material must move beyond basic text flow. Serials and newspapers have articles, but they also have ads, pictures, charts, and graphs; poetry has stanza, meter, feet and rhyme scheme. Structuring the text flow in various meaningful ways can help scholars move beyond the “bag of words”, opening up new possibilities in the digital humanities.
- Collaborate, Collaborate, Collaborate. The people who are comfortable with the use and adaptation of computational tools are most often technologists and statisticians. The people with questions in various fields of the humanities are humanities scholars. Successful projects require that these people work together to ask and answer questions. The humanities has traditionally rewarded scholars who make individual contributions, but computational projects are best accomplished in a collaborative setting. Scholars in the humanities who are interested in computational approaches might want to consider working within the lab model commonly found in the sciences, where people with diverse skill sets come together to further inquiry.
- Facilitation is crucial, especially in the present transition. As people with different skill sets come together, they need to find ways to communicate effectively. The current descriptions of the HTRC tools are very technical, and it was generally acknowledged that there needs to be a “gloss-description” provided that will help humanities scholars determine what each tool offers. Similarly, as technologists and humanists work together, they may find themselves speaking different languages. Setting up facilitated conversations can help. As the culture and curriculum of the humanities evolves towards adoption of computational approaches, there may be less need for this, but at the present, it is of vital assistance to make digital humanities collaborations effective.
I will post links to the full conference notes as they become available. In the meantime, feel free to refer to the information on the conference page, and the HTRC wiki. Your comments and questions are welcome.
In February of 2013, Digital Scholarship and Preservation Services (DSPS) began a two-year NEH-funded project to preserve access to the complex born-digital media art objects in CUL’s Rose Goldsen Archive of New Media Art. Read the press release here.
The Goldsen Archive’s collections span a 50-year history of aesthetic experimentation with electronic media, from early video and sound art to interactive digital interfaces and beyond. The Preservation and Access Framework project centers on a target group of interactive digital works for CD-ROM, DVD-ROM, and the Internet, which chart the transformation of artistic practices across two crucial decades of the digital revolution—the shift from analog to disc-based to networked and Web-based applications. These artworks constitute an important art historical collection, and a vital record of our cultural and aesthetic history as a digital society. Our project addresses an increasingly urgent demand for institutional preservation and access strategies for interactive, born-digital assets, with the understanding that these materials will be essential for the study, appreciation, and understanding of digital cultural history
Recent progress with CULAR has made it possible to begin provisioning for large-scale ingest of bit-level copies of these assets into a secure digital repository. This is an essential preservation measure, but will not guarantee fully interactive access to the artworks.
In addition to developing SIPs (Submission Information Packages) for these artworks and procedures for automating their processing and ingest, the project will also create a broader preservation metadata framework and workflow in order to emphasize “best possible” access, as informed by the needs of various user constituencies.
Project staff from DSPS and Rare and Manuscript Collections are joined by Desiree Alexander, dual Masters student in Information Science and Public History at Albany, who is Collections Analysis Assistant for the project, and Dianne Dietrich (Physics and Astronomy Librarian), who joins the project as Digital Forensic Analyst and technical lead.
Under Dianne Dietrich’s direction, the technical team has nearly completed preliminary experimentation with emulation platforms and an initial technical overview of artworks in the test bed. We recently began the next project phase, with the arrival of a dedicated digital forensics workstation. Shared by Digital Scholarship and Preservation Services, Rare and Manuscript Collections, and CUL-IT, this workstation makes a welcome addition to CUL’s audiovisual media preservation lab and expands CUL’s portfolio of media preservation capabilities.
Though just now wrapping up its initial stages, the Preservation and Access Framework project has already helped RMC and DSPS define and implement sustainable preservation policies and workflows for born-digital media. Please see the project wiki or contact Mickey Casad (email@example.com) for more information, and stay tuned to DSPS Press for further updates about project milestones and discoveries.
Preservation and Access Framework for Digital Art Objects Project Team
Oya Rieger (Associate University Librarian for Digital Scholarship and Preservation Services)
Timothy Murray (Director, the Society for the Humanities, Professor of English and Comparative Literature, Curator of the Rose Goldsen Archive of New Media Art).
Mickey Casad (Curator for Digital Scholarship; Project Manager)
Dianne Dietrich (Physics and Astronomy Librarian; DSPS Fellow; Digital Forensic Analyst)
Desiree Alexander (Collections Analysis Assistant)
Jason Kovari (Metadata Librarian for Humanities and Special Collections; Metadata Specialist)
Danielle Mericle (Director, Digital Media Group; New Media Specialist)
Liz Muller (Curator of Digital Collections, RMC; Digital Preservation Specialist)
Michelle Paolillo (Manager, CULAR; Archival Repository Liaison)
AudioVisual Preservation Solutions (Alex Duryee, Chris Lacinak, Cara VanMalssen)
All-Star Advisory Board:
Ben Fino-Radin (Digital Conservator, Rhizome at the New Museum, New York, NY)
Jean Gagnon (Director, Montreal Cinemateque, Canada)
Rebecca Guenther (Independent Metadata Consultant)
Matthew Kirschenbaum (Associate Professor of English, Associate Director, Maryland Institute for Technology in the Humanities, University of Maryland)
Jon Ippolito (Associate Professor of New Media, University of Maine)
Norie Neumark (Director, Centre for Creative Arts & Chair and Professor, Cinema and Media Studies Program, La Trobe University, Melbourne, Australia)
Christiane Paul (Curator of New Media Arts, Whitney Museum of American Art)
Richard Rinehart (Director, Samek Art Gallery, Bucknell University)
Simeon Warner (Director, Software Development & Repository Architecture, Cornell University Library)
In collaboration with the Society for the Humanities and Olin/Uris Library, CUL Digital Scholarship and Preservation Services (DSPS) is piloting a small-scale graduate student digital scholarship internship program this summer. DSPS is joining forces with Olin/Uris Library to organize orientation and mentoring sessions for five graduate students in Arts and Sciences. The objectives of this program are:
1. To increase the use and visibility of CUL’s digital tools and resources, particularly among younger researchers
2. To encourage a collaborative relationship between the library and the next generation of humanities scholars
3. To help graduate students expand their digital skills through projects that will make them more competitive in a changing academic landscape.
The pilot implementation will be a 6-week program, requiring approximately 10 hours/week. Interns will have a chance to tour DSPS digital preservation labs and hear from DSPS librarians about building and managing digital collections and collaborating with faculty on special projects in digital scholarship.
The program will provide an introduction to some new digital scholarship tools, such as text mining, topic modeling, and data visualization, and a chance to test out different scholarly communications and online content management platforms such as Scalar and Omeka. Interns will use these tools and platforms to create their own digital scholarship projects over the 6-week internship period. We will announce a public showcase of these summer projects during the 2013-2014 academic year.
This program has been organized by Mickey Casad, Virginia Cole, Michelle Paolillo , and Jaron Porciello, all of whom will lead tutorials with digital scholarship tools and provide guidance and support to the interns over the summer. The organizers are grateful to Danielle Mericle, Oya Rieger, Kizer Walker, Keith Jenkins, and Kathy Chiang for discussing their own projects with the intern group
Our five interns for the Summer 2013 pilot internship program are:
Hannah Byland (Medieval Studies)
Megan Kruer (Romance Studies, French)
Nicolette Lee (English)
Katrina Nousek (German Studies)
Lynne Stahl (English)
We congratulate the new interns and look forward to seeing the outcomes of their experience.
Please contact Mickey Casad (firstname.lastname@example.org) for additional information about the pilot program. Here is a brief story about the pilot program:
DSPS staff Mickey Casad, Michelle Paolillo, and Jaron Porciello joined 250 Cornellians to listen, learn and present a poster at the June 13 IT@Cornell event: Building our community, together.
Our triptych poster* highlights services, expertise, and emerging collaborative opportunities between DSPS and the broader Cornell community. Services such as digitization, digital repositories and platforms, copyright services, and needs requirements, continue to be available to the entire Cornell community through the department’s cost-recovery service arm, DCAPS. In addition, we are also continually building expertise in emerging areas of digital scholarship, such as text-mining and visualization, in order to support innovative projects such as “Freedom on the Move: A Database of Fugitives from North American Slavery.” ”Freedom on the Move” is a collaboration between CISER, CUL, and History Professor Ed Baptist that aims to collect, collate and eventually crowdsource for correction and analysis thousands of runway slavery advertisements from 18th and 19th newspapers and pamphlets. Eventually, this will be an interactive public history project that represents the kind of community outreach and serious research that can be seamlessly coupled in today’s digital scholarship arena.
Many of the people we talked to at IT@Cornell shared a common interest of helping researchers discover new tools in order to both dig deeply and broadly across the mass of scholarship that now digitally available. As the needs of researchers are changing, it is not always the PhD subject experts that have the time or alacrity to become digital scholarship experts. We believe there is a strong service role to provide in this vital and growing sector; one that can be flexible, to allow researchers the freedom to explore, but also offers a framework with just-in-time customized support that is crucial to leveraging and implementing new tools and analysis in scholarship.
We also heard concerns about how to scale and sustain customized service for multiple researchers– and how to provision for research projects with potentially long life-spans. While we may share some of these concerns, DSPS has proven itself to be a nimble partner to many faculty and staff across campus by continually assessing and revising, and sometimes shedding, digital scholarship services as needs evolve. Library staff have a strong role to play as collaborators alongside faculty and IT communities as digital scholarship and born-digital content continues to flourish. We are delighted to work in this space and provide opportunities for scholars to work alongside the library to increase these opportunities.
– Jaron Porciello, Mickey Casad, Michelle Paolillo
(*Thank you to Carla DeMello, RAU graphic designer, for listening to our ideas and creating a beautiful poster for us to use again and again.)
(by Peter Hirtle)
As MOOCs (Massively Open Online Courses) have become of interest to higher education, so too have they drawn the attention of libraries. Now that Cornell has joined the EdX partnership, MOOCs are of particular interest to CUL. The article “Massive Open Opportunity: Supporting MOOCs in Public and Academic Libraries” is a good introduction to some of the many issues that MOOCs present for libraries, but here I want to focus primarily on the copyright issues associated with MOOCs.
Copyright is a concern in 3 different areas:
- Copyrighted material used in streamed lectures
- Copyrighted material that students have to read outside of class
- Ownership of the MOOC course material. (I assume this is being addressed in the agreement with faculty and so won’t discuss it further.)
Copyrighted material used in streamed lectures
Copyright law gives faculty a lot of leeway to use copyrighted material during the course of face-to-face instruction. Unfortunately faculty may not realize that the poem they read aloud or the movie they show in class can be an infringement when the class moves online. Nevertheless, there are several options available:
- The class might be structured in a way that it complies with the TEACH Act, which allows limited use of streamed work if the course is “offered as a regular part of the systematic mediated instructional activities of,.. an accredited nonprofit educational institution.” I don’t know if the Cornell MOOCs would fall in that category. There are several other technical issues associated with the TEACH Act that would have to be met if we wanted to take advantage of its protections.
- Fair use is also available, though given the large number of students in a MOOC, it is important to limit the use made to a small portion of the original. Kevin Smith at Duke likes to tell the story of a faculty member who wanted to show a long excerpt from a movie in a lecture. After discussion, the faculty member realized that 20 seconds was all he needed to illustrate the point that he was trying to make, and Kevin concluded that this would be a fair use.
In general, though, MOOCs are not relying on fair use or the TEACH Act. Instead a number of other approaches are followed:
- Faculty are encourage to author any material used in their courses and not rely on 3rd-party copyrighted material.
- If 3rd party material is needed, faculty are encouraged to look for open access versions, such as FLICKR images that are licensed as CC BY or CC BY-NC.
- Attribution should be provided to the original source in slides and on other class materials.
- Rather than copying material into the MOOC, faculty should rely on external and embedded links as much as possible (especially important with the next group of materials)
- Permission to use the material in the course can be licensed. Note that this can be very, very expensive given the number of students in the course.
Copyrighted material that students have to read outside of class
There may be material such as journal articles, book chapters, textbooks, and other material that will be assigned to be read outside of lectures. Most commentators agree that it is difficult to argue for fair use (as we do with e-reserves) for a course that may have 50,000 students in it. Nor will the licenses purchased by the library for Cornell users likely cover use in a MOOC. Permission of the copyright owner is likely to be required.
Kevin Smith has told me that he has had some luck with securing permission to use a chapter from a book for free by pointing out that it is a marketing opportunity. The inclusion of a chapter from a book in a MOOC has led to an increase in the purchase of book. But this requires getting past the permissions department and talking to marketing people instead. As Kevin mentions in a quote below, the process is “slow and labor-intensive.”
Both the Copyright Clearance Center and SPIX, a Stanford University spin-off, are claiming that they can secure permissions to use material in MOOCs, but the expense of doing this would be substantial. And in a free MOOC, who would pay for the permission?
In sum, instructors should first seek to use open access materials. The Library can help faculty identify acceptable open-access resources. If there are no appropriate open access resources, faculty should request a free license from the publisher to include the material. If that doesn’t work, someone may have to pay for permission to include the material.
Penn has prepared a very good guide to copyright issues in MOOCs. Penn is using Coursera, a commercial product, so they downplay the fair use option (which is probably risky even in a non-profit environment).
And here is Kevin Smith’s description of Duke’s practice:
When our instructors want to provide readings for students taking a MOOC, we generally pursue one of two options. Either we negotiate with publishers, who are slowly figuring out the marketing advantage they gain by allowing small excerpts of books and textbooks to be made available freely, or we look for OA content. Unfortunately, the negotiation option is slow and labor-intensive; often we must explain the purpose and the conditions over and over again, to ever-shifting groups of officials, before we can get a decision. So open access is ever more important, because more efficient, for our MOOC instructors and their students.
One story will illustrate this growing interest in open access. A faculty member who was recently preparing to teach his first MOOC wanted his students to be able to read several of his own articles. When we asked his publisher for permission on his behalf, it was denied. A rude awakening for our professor, but also an opportunity to talk about open access. As it turned out, all of the articles were published in journals that allowed the author to deposit his final manuscripts, and this author had them all. So we uploaded those post-prints, and he had persistent, no-cost links to provide to the 80,000 students who were registered for his course. An eye-opener for the author, a missed opportunity for the publisher, and a small triumph for our OA repository. Enough of a triumph that this professor has begun asking colleagues if they could deposit post-prints of their own articles in the repositories at their institutions so that he can use those for his MOOC students as well.
By Gail Steinhart, Simeon Warner, and Oya Rieger
From April 2011 through March 2013, arXiv collaborated on a pilot project with the Data Conservancy to support remote data deposit for arXiv submissions. The Data Conservancy, initially funded by the US National Science Foundation (NSF), is a project funded by , which aims to “research, design, implement, deploy and sustain data curation infrastructure for cross-disciplinary discovery with an emphasis on observational data”. We sought to understand how researchers took advantage of the pilot, and looked at over 200 arXiv submissions with data sets in the Data Conservancy. In our examination we noted a publication’s subject area, number of data files and the combined size of all data files, and file extensions. In a subset of cases, we also examined publications for references to the availability of data sets, as well as looking for anything that could be construed as metadata in the publication or with the data sets themselves.
Amount and size of data
For the most part, authors are submitting a few small files to the Data Conservancy. Submissions included 1-944 separate data files, but an average of less than 10 data files per submission. The combined size of the largest data collection was 819MB, with an average of just under 20MB.
We counted 42 different file extensions for 1837 files in the submissions we examined. The most common classes of files were documents (.pdf or .tex), videos, and image or graphic files. The maximum number of different extensions associated with a single submission was 5 (for 19 data files), and only 6 of the submissions we inspected had more than 2 different file extensions. A cursory check of whether the most common file formats for data submitted to the Data Conservancy are listed as “preservation friendly” by either the UK Data Archive or Cornell’s institutional repository, eCommons showed that many are not, which suggests some substantial preservation challenges in the long term.
We should note that files submitted to the Data Conservancy weren’t always what we might think of as “data.” Examples of “non-data” uploaded to the Data Conservancy included copies or alternate formats of papers, documents with extra information about methods or other explanatory material, and higher resolution versions of figures from the papers.
Of the submissions we examined, a handful of arXiv subject areas had more than ten submissions with data in the Data Conservancy. They were Condensed matter (48 submissions), Astrophysics (32), Mathematics (32), Physics (30), Computer science (23), and High energy physics (12).
Metadata and references to data sets
For 54 unique submissions, we looked in the papers for references to supplementary data, and at both the papers and the data sets themselves for anything that could be considered metadata describing the data sets. Here’s what we found:
- For five submissions, the supplementary files uploaded to DC were copies of (or in one case, a translation of) the paper itself.
- 23 submissions included an explicit reference to supplementary materials. Four of these referenced supplementary material available elsewhere (on a lab website, for example); otherwise references to supplementary data or files were not usually explicit about the location of those files.
- In 26 submissions, we found no reference to supplementary materials.
- Authors that did refer to supplementary data files did so either at the end of a paper (12, in references, as an endnote, or appendix), within the text itself (13, in-line, as a footnote, or in a figure legend), or both (4).
- Only nine submissions included information that could be thought of as metadata. When present, metadata was included as a standalone document such as a readme or appendix (four cases), a brief description within or at the end of the paper (three cases), or within the supplementary files themselves (two cases).
Additional observations and conclusions
arXiv has allowed authors to upload supplementary files since 2010, and some authors may have continued to use that option rather than experimenting with the Data Conservancy pilot. With the addition of Data Conservancy as an option for data sets, authors sometimes made supplementary information available for a single submission by multiple means (Data Conservancy, arXiv, or their own website). Others sometimes made one piece of supplementary information available by one method while using another method for another, different supplementary file.
The results of the pilot show that even though a small proportion of arXiv submissions (less than 1%) include data deposit to the Data Conservancy, support for online distribution of data sets and other supplementary content is a useful service to some. At present we wouldn’t consider the volume of data to be overwhelming, however, the lack of specific references between many papers and their related data sets, general lack of metadata, and the preservation challenge presented by a wide array of file formats suggest some non-trivial challenges in providing and sustaining such a service at scale and for the long term.
 File format extensions appearing in at least two submissions: .pdf, .tex, .avi, .mpeg, .eps, .txt, .mp4, .mov, .jpg, .gif, .nb, .wmv, .bbl, .dat, .fits, .png, .ps, .rar, .xls