• DocumentCode
    2016847
  • Title

    Improving the informativeness of verbose queries using summarization techniques for spoken document retrieval

  • Author

    Lin, Shih-Hsiang ; Chen, Berlin ; Jan, Ea-Ee

  • Author_Institution
    Comput. Sci. & Inf. Eng., Nat. Taiwan Normal Univ., Taipei, Taiwan
  • fYear
    2010
  • fDate
    Nov. 29 2010-Dec. 3 2010
  • Firstpage
    75
  • Lastpage
    79
  • Abstract
    Query-by-example information retrieval aims at helping users to find relevant documents accurately when users provide specific query exemplars describing what they are interested in. The query exemplars are usually long and in the form of either a partial or even a full document. However, they may contain extraneous terms (or off-topic information) that would have a negative impact on the retrieval performance. In this paper, we propose to integrate extractive summarization techniques into the retrieval process so as to improve the informativeness of a verbose query exemplar. The original query exemplar is first divided into several sub-queries or sentences. To construct a new concise query exemplar, summarization techniques are then employed to select a salient subset of sub-queries. Experiments on the TDT Chinese collection show that the proposed approach is indeed effective and promising.
  • Keywords
    document handling; information retrieval; TDT Chinese collection; off topic information; query-by-example information retrieval; spoken document retrieval; summarization techniques; verbose queries; Estimation; Hidden Markov models; Information retrieval; Machine learning; Speech; Speech recognition; Training; information retrieval; query exemplar; query-by-example; summarization technique; verbose queries;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
  • Conference_Location
    Tainan
  • Print_ISBN
    978-1-4244-6244-5
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2010.5684847
  • Filename
    5684847