Title :
DataGopher: Context-based search for research datasets
Author :
Singhal, Ayush ; Kasturi, Ravindra ; Srivastava, Jaideep
Author_Institution :
Dept. Of Comput. Sci. & Eng., Univ. of Minnesota, Minneapolis, MN, USA
Abstract :
Scientific datasets play a crucial role in data-driven research. While, several search tools are developed for searching documents, blogs, images, videos and various other information needs, important scientific artifacts like research datasets lack this prerogative. The main challenge faced in developing an effective search tool for datasets is to determine the content representation of the raw data. Dataset description provided by users is often very content-specific and short. Moreover, even the public datasets generally have very limited description about the various research problems/applications that used them. Given the ever expanding variety of datasets on the web and the lack of representative content for the purpose of indexing, the task of developing an effective search engine for dataset is computationally very challenging. In this work, we propose a novel `context´ based paradigm of search for dataset to overcome the problem of limited representative content for research datasets. In contrast to any general purpose search engine which index the `little´ text information about the dataset sources, we hypothesized that the proposed paradigm of `context´ based search is more effective for dataset search. The hypothesis is tested by conducting a user study. The performance of the context based search (DataGopher) is compared with a popular general purpose search engine. The study was conducted in a real world setting where user are free to use the search engine as per the information need. Based on the user study, we find that the performance of DataGopher was favored for 58% of the total context based user queries whereas the baseline was only 26%.
Keywords :
data handling; search engines; DataGopher; blogs; content representation; context based search; data driven research; dataset description; dataset sources; document search; images; raw data; representative content; research datasets; search engine; text information; videos; Context; Indexing; Multimedia communication; Search engines; Videos;
Conference_Titel :
Information Reuse and Integration (IRI), 2014 IEEE 15th International Conference on
DOI :
10.1109/IRI.2014.7051964