Google Dataset Search and Its Implementation
https://www.searchenginejournal.com/google-dataset/273441/
As more and more data scientists continue to extract previously-untapped value from large collections of data for scientific, economic, and medical purposes, one of the most helpful and empowering tools recently created has been Google’s Dataset Search. The service essentially helps data scientists locate and explore a large portion of the world’s publicly available datasets. Dataset Search, a Google service that only recently became available to the general public about a month ago (early September), was interesting to read about for a variety of reasons and related to the material covered in class thus far in two main ways.
Firstly, since the service is so new, Google does not have the usage statistics necessary to implement the of search optimizations one can see on its web search service such as historically aware autocomplete and personally tailored search results. Instead of relying on such individual-based features, the algorithm that powers Dataset Search, as Roger Montti notes in the article linked above, relies largely on Google’s existing “Knowledge Graph.” Interestingly, it uses connections in this graph to help weigh and therefore rank various results in Dataset Search – this is intriguing because it implies that datasets on this service seem to be weighted by aspects bigger than themselves such as countries, companies, and languages. It would be interesting to see how this weighing and ranking system plays out as Google gathers more usage data on its Dataset Search.
Another key aspect of Dataset Seach that relates to our class’s material thus far is in how it handles dataset authorship. Montti points out that another Google service, Google Scholar, may be used in multiple ways for the Dataset Search. Firstly it may be used as another weighing factor for the dataset results, increasing the ranking for datasets connected to more reliable authors. Secondly, and interestingly, Google suggests that dataset authors and other scholars will have a way to see a graph of what studies and papers have cited their work. I find this potential graph to be fascinating because it could potentially expose different components in citation graphs and therefore different cohorts of thought in academic subjects. This graph could also potentially expose biases and strong dependencies on certain works in any given study.