Search v0.4: bug-fixes, date-related searches

This morning we released v0.4 of the arXiv search application. Here is a quick run-down of bug-fixes and new features introduced in this release.

New Features

Ability to hide abstracts in search result

We received quite a few requests for the ability to disable abstracts in search results. We went ahead and added the feature, which is provided in both simple and advanced search. Your last selection is “sticky,” so that you do not have to take specific action to disable or enable it every time.

Screenshot demonstrating the hide abstracts feature
We introduced a feature that allows users to disable abstracts in search results.

Support searches for seven-digit arXiv ID partials

We do support searches by arXiv paper IDs, including partials on new-style paper IDs (e.g. 1802). But we got a few requests from users who wanted to be able to enter the numeric part of old-style semantic paper IDs. For example, searching for 9509147 should yield astro-ph/9509147cond-mat/9509147, etc. This is now fixed.

Screenshot demonstrating support for seven-digit paper ID partials
You can now search old-style arXiv paper IDs without their semantic prefix.

Filtering search on submission date

We added a little more functionality to the date-filtering part of the advanced interface. You can now choose to filter on the submission date of the current version or the original version, or on the announcement date.

Screenshot demonstrating new options for date range filtering in the advanced search interface
We added some new options for date range filtering in the advanced search interface.

Queries from abstract, category pages are now limited by archive

In an earlier version of search, we noted that the legacy arXiv data model presented some challenges when it came to providing a precise set of results for a specific author. It is important to note that the current search system is in no way less precise than its predecessor. One thing that made it seem that way, however, is that links from authors names (e.g. on the abstract page) were no longer limited to the archive in which the paper indexed. The change was creating enough confusion and consternation that it seemed to outweigh the benefits of the broader search, so we restored the classic behavior.

Screenshot demonstrating queries limited by archive
When you click on an author name (e.g. on the abstract page), search results are now limited by archive. You can click “Search in all archives” to expand your query.
  • Clicking on an author name in an hep-lat paper should only return results from hep-lat.
  • We added a link to “search in all archives.”
  • You can bookmark archive-specific search URLs, e.g. https://beta.arxiv.org/search/cs (this was already available in the advanced interface; we just added it for simple search as well).

Offer full text and help pages as options in simple search

These were available via the search box in the header, but were not present in the simple search form. We went ahead and added them. We decided not to add them to the advanced search, since they cannot be used in a combinatorial fashion like other options (they are entirely different platforms).

Improvements

Query date ranges are inclusive

We let our Python show a bit in early versions, and made the upper bound of date range queries in the advanced interface exclusive rather than inclusive. This is now fixed, so that date ranges are always inclusive (i.e. 2016-2018 includes papers in 2018).

Make “back” link on search results more prominent/obvious

Some users were uncertain about how to go back and change their queries. We tried to make it a little more obvious by adding some buttons in the upper left corner.

Screenshot showing new buttons to refine a search or start over
We added buttons to “refine this query” and “start a new search” in the upper left corner of the page.

Bugs

  • Fixes to collisions between hit highlighting and MathJax rendering. Particularly in the math domain, we were noticing quite a few equations that were getting mangled by search term hit highlighting. These are now fixed.
  • A discrepancy in how author-owner names were being searched with and without a comma delimiter has been fixed.
  • Fixed incomplete support for announcement date in all-fields searches. The legacy search system supported a yyMM format, which is related to the arXiv paper ID format, and we supported that in earlier versions. But a search for (say): bloggs 2015 boson still was not considering announcement date when handling the 2015 token. This is now fixed
  • Author names were not parsed correctly when they are separated by both commas and the word “and”. For example, author strings like Jane Doe and, Joe Bloggs were throwing our name parsing routine off a bit. This has been fixed.
  • Unexpected errors were being thrown in a limited number of wildcard searches; this is now fixed.
  • Several minor fixes to correct unexpected errors.