• DocumentCode
    661258
  • Title

    Spoken document retrieval using both word-based and syllable-based document spaces with latent semantic indexing

  • Author

    Ichikawa, Kazuhisa ; Tsuge, Satoru ; Kitaoka, Norihide ; Takeda, Kenji ; Kita, Kahori

  • Author_Institution
    Nagoya Univ., Nagoya, Japan
  • fYear
    2013
  • fDate
    Oct. 29 2013-Nov. 1 2013
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    In this paper, we propose a spoken document retrieval method using vector space models in multiple document spaces. First we construct multiple document vector spaces, one of which is based on continuous-word speech recognition results and the other on continuous-syllable speech recognition results. Query expansion is also applied to the word-based document space. We proposed to apply latent semantic indexing (LSI) not only to the word-based space but also to the syllable-based space, to reduce dimensionality of the spaces using implicitly defined semantics. Finally, we combine the distances and compare the distance between the query and the available documents in various spaces to rank the documents. In this procedure, we propose to model the document by hyperplane. To evaluate our proposed method, we conducted spoken document retrieval experiments using the NTCIR-9 SpokenDoc data set. The results showed that using the combination of the distances, and using LSI on the syllable-based document space, improved retrieval performance.
  • Keywords
    document handling; indexing; information retrieval; speech recognition; LSI; NTCIR-9 SpokenDoc data set; continuous syllable speech recognition; continuous word speech recognition; hyperplane; latent semantic indexing; multiple document vector spaces; query expansion; spoken document retrieval performance; syllable based document spaces; vector space models; word based document spaces; Indexes; Large scale integration; Semantics; Speech; Speech recognition; Vectors; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific
  • Conference_Location
    Kaohsiung
  • Type

    conf

  • DOI
    10.1109/APSIPA.2013.6694119
  • Filename
    6694119