New User-Driven Classification Tool Released

The correct categorization of scholarly works is vitally important. It helps readers find the information they seek — and assures authors that their research will be found by the right audience. Recently, arXiv released a new user-driven classification tool to help authors choose the correct category for their papers during the submission process.

“I think this is a great addition, and I am already seeing positive effects,” said Frank Simon, Max-Planck-Institute for Physics, Munich, Germany and arXiv moderator. “Since the feature has been in operation, submitters tend to classify things for my category (physics.ins-det) correctly more often.”

When submitting to arXiv, authors select a category for their paper, such as computational geometry or quantum physics. Now, in real time, the automated classifier double checks that selection by comparing the paper to those already hosted on arXiv.org. If the author’s selection doesn’t match the classifier’s recommendation, an alternate category is proposed. The author can review the suggestion, accept or decline it, and continue the submission process, as shown below.

This is a screen shot of arXiv's classifier recommendation tool.

In the past, only moderators had access to the automated classifier recommendations, after submission. Moderators reviewed the recommendations, and, if necessary, reclassified papers. This added time to the process and often led to delays.

Integrating the classifier recommendations into the submission process empowers authors and increases transparency when reclassification is necessary. Additionally, this feature is expected to reduce the workload for moderators and lower the number of papers put on hold due to misclassification. Improved category selection during the submission process will provide more accurate classification at the outset and therefore reduce delays in announcing papers.

“In my category, the most common issue is the case where instrumentation papers that should have physics.ins-det as primary category are instead submitted to hep-ex, nucl-ex, physics.optics or the like, and subsequently need to be reclassified,” Simon said. “Alerting the submitters that their instrumentation paper should be submitted to the instrumentation category rather than one of the others will reduce moderation workload and speed up the release of the articles, so it is a win-win situation for both submitters and moderators.”

Other moderators have also expressed enthusiasm about the new feature, calling it important, worthwhile, a fantastic idea, and a big step forward, “even if there will be hiccups” in the beginning. In the first four days after release, 17.4% papers received category recommendations and 67.5% of authors accepted the suggested category.

arXiv continues to welcome feedback on this feature, which was first tested by volunteer moderators. Authors who receive a category recommendation will have the opportunity to complete a survey about their experience. Moderators are sharing their observations and experiences with arXiv staff, and staff members are monitoring server log statistics.

Moving forward, arXiv is developing a new version of this machine learning classifier tool that will be trained by feedback from users and moderators in order to provide better category recommendations in the future.

We encourage the arXiv community to participate. Join the user testing group here and share your experiences on Twitter with the hashtag #arXivexperience.