• DocumentCode
    249473
  • Title

    Spatio-temporal Pseudo Relevance Feedback for Large-Scale and Heterogeneous Scientific Repositories

  • Author

    Takeuchi, Shoji ; Akahoshi, Yuhei ; Ong, Bun Theang ; Sugiura, Komei ; Zettsu, Koji

  • Author_Institution
    Universal Commun. Res. Inst., Nat. Inst. of Inf. & Commun. Technol., Kyoto, Japan
  • fYear
    2014
  • fDate
    June 27 2014-July 2 2014
  • Firstpage
    669
  • Lastpage
    676
  • Abstract
    As larger and larger amounts of data are harvested, finding just the right piece of information out of this noisy and heterogeneous ocean of data remains challenging. Many widely adopted scientific data search engines continue to be mainly based on text semantics. However, it is not uncommon in scientific big data applications to face collected data that do not possess text information. In this scenario, search engines fail to retrieve potentially relevant data. For instance, even though Pangaea, a digital data library and a publisher for earth system science, contains more than 400,000 datasets, more than 98% lack sufficient text information. In this work, we propose a novel pseudo relevance feedback method based on spatio-temporal and text (STT) information for scientific big data: STT-PRF. Although STT-PRF may simultaneously use STT information, we show that the missing values in space, time or/and the text are handled efficiently. STT-PRF is especially robust even without text information. We tested our STT-PRF method using the Pangaea repository on our Cross-DB Search Platform, which is a search engine for scientific big data based on various latent correlations. Experimental evaluations on such standard metrics as nDCG and Precision/Recall show that STT-PRF outperforms the standard baseline methods.
  • Keywords
    Big Data; relevance feedback; scientific information systems; Cross-DB search platform; Pangaea repository; STT-PRF method; heterogeneous scientific repositories; large-scale scientific repositories; nDCG; precision-recall metrics; scientific big data; search engine; spatio-temporal and text information; spatio-temporal pseudo relevance feedback; Abstracts; Big data; Calculators; Indexes; Market research; Search engines; Standards; information retrieval; pseudo relegance feedback; query expansion; scientific data; spatio-temporal and text information;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2014 IEEE International Congress on
  • Conference_Location
    Anchorage, AK
  • Print_ISBN
    978-1-4799-5056-0
  • Type

    conf

  • DOI
    10.1109/BigData.Congress.2014.100
  • Filename
    6906843