Technical Considerations for arXiv Compliance with Plan S

By Erick Peirson, arXiv Lead System Architect

We acknowledge the importance of Plan S, and are committed to continue to monitor the developments and assess how we can best support our users in compliance with open access mandates. However, at this time arXiv is not compliant with many of the recommended requirements, as well as some of the mandatory requirements. We currently do not have a timeline for adding the required features yet. 

This post summarizes my best understanding of the technical considerations surrounding arXiv compliance with Plan S, based on the revised Principles and Implementation guidance document (https://www.coalition-s.org/principles-and-implementation/). This is adapted from my Twitter thread on 31 May, 2019.

Background

Summary

It is encouraging to see cOAlitionS committing to working with domain experts like DOAJ, DOAR, and others on compliance. A shortcoming of the first draft was that it appeared to lack the kind of nuance and practical considerations that come from domain knowledge.

The upshot, from my perspective:

  • This draft is a big improvement on the first draft of the Plan S guidance; far clearer, and far more achievable.
  • arXiv is nearly compliant with mandatory requirements already, but we can use help from partners, funders, and developers to close the gap.
  • arXiv is not compliant with many of the recommended requirements; we can use help to make progress in these areas, as well.

The remainder of this post focuses on Part III Section 2.1 (Requirements for Open Access repositories).

Mandatory criteria

“The repository must be registered in the Directory of Open Access Repositories.”
arXiv is currently listed in DOAR:
http://v2.sherpa.ac.uk/id/repository/18

“Use of PIDs for the deposited versions of the publications (with versioning, for example in case of revisions), such as DOI (preferable), URN, or Handle.”
arXiv has a robust URN scheme widely adopted for citation purposes:
https://arxiv.org/help/arxiv_identifier

“High quality article level metadata in standard interoperable non-proprietary format, under a CC0 public domain dedication.”
As described in our API Terms of Use, arXiv metadata are made available for re-use under CC0:
https://arxiv.org/help/api/tou

“[Metadata] must include information on the DOI  (or other PID) both of the original publication and the deposited version, on the version deposited (AAM/VoR), and on the Open Access status and the license…” arXiv makes distribution license information available on the abstract page for each e-print, and in metadata retrieved programmatically via our APIs.

X “Metadata must include complete and reliable information on funding provided by cOAlitionS funders.” arXiv currently lacks a facility for tracking funding information in reliable and maintainable fashion. We invite external organizations and individuals to help us to develop the tools to do this; please see our community projects list.

“Continuous availability (uptime at least 99.7%, not taking into account scheduled downtime for maintenance or upgrades).”
arXiv’s availability exceeds 99.9%, so this is not an issue for us. However, I find it peculiar that cOAlitionS got this far down into the weeds.

“Helpdesk: as a minimum an email address (functional mailbox) has to be provided.”
arXiv has dedicated full-time staff to support users, reachable via
help@arxiv.org. See https://arxiv.org/help/contact.

? “Helpdesk …a response time of no more than one business day must be ensured.”
We’re back to the ambiguity of earlier drafts. 100% of cases? While same-day responses are the norm, arXiv will need funding for additional staff to guarantee 100% compliance.

Strongly recommended additional criteria

It is heartening to see an indication that #cOAlitionS recognizes that not all repos will have the resources to implement everything their hearts desire. This is a welcome improvement to the guidance.

X “Manuscript submission system that supports both individual author uploads and bulk uploads of manuscripts (AAM or VoR) by publishers.”
arXiv needs funding and/or developer-power to support build-out of a robust, scalable submission API. We invite external organizations and individuals to help us to develop the tools to do this; please see our
community projects list.

X “Full text stored in a machine-readable community standard format such as JATS XML.”
Still a bit vague, but better than before. Makes it clear that machine-readable + non-proprietary are the salient points. Here is another area where arXiv needs help, preferably in the form of developer support. Please see our community projects list.

X “Support for PID for authors (e.g., ORCID), funders, funding programmes and grants, institutions, and other relevant entities.”
We do support linking ORCIDs to arXiv user accounts for search purposes. See
https://blogs.cornell.edu/arxiv/2019/05/16/orcid-doi/.  arXiv needs help from external contributors and funders to provide better support (including ORCIDs for users that aren’t “owners” of e-prints). Please see our community projects list.

X “Openly accessible data on citations according to the standards by the Initiative for Open Citations (I4OC).”
This falls a bit outside the core arXiv mission, so is a lower priority. However, we have done foundational work on extracting references from arXiv e-prints, and external contributors could make a big impact on making these shareable and interoperable. Please see our community projects list.

“Open API to allow others (including machines) to access the content.”
arXiv is a leader in adoption of OAI-PMH (
https://arxiv.org/help/oa), and is pushing forward on a modernized JSON API.

“OpenAIRE compliance of the metadata.”
arXiv currently meets the “minimal participation requisites,” but OpenAIRE has provided additional guidance concerning improvements to arXiv metadata. External contributors could help with this. Please see our
community projects list.

? “Quality assurance processes to link full-text deposits with authoritative bibliographic metadata from third party systems, e.g., PubMed, CrossrefOrg, or Scopus where feasible.”
We do currently harvest DOIs and journal reference information from publisher feeds, and use that information to update the corresponding records in e-print metadata. However, this system does not work very well, and won’t scale. This is another area where external contributors could have a big impact. Please see our
community projects list.