مرکز منطقه ای اطلاع رساني علوم و فناوري - Spoken document retrieval using both word-based and syllable-based document spaces with latent semantic indexing

DocumentCode :

661258

Title :

Spoken document retrieval using both word-based and syllable-based document spaces with latent semantic indexing

Author :

Ichikawa, Kazuhisa ; Tsuge, Satoru ; Kitaoka, Norihide ; Takeda, Kenji ; Kita, Kahori

Author_Institution :

Nagoya Univ., Nagoya, Japan

fYear :

2013

fDate :

Oct. 29 2013-Nov. 1 2013

Firstpage :

Lastpage :

Abstract :

In this paper, we propose a spoken document retrieval method using vector space models in multiple document spaces. First we construct multiple document vector spaces, one of which is based on continuous-word speech recognition results and the other on continuous-syllable speech recognition results. Query expansion is also applied to the word-based document space. We proposed to apply latent semantic indexing (LSI) not only to the word-based space but also to the syllable-based space, to reduce dimensionality of the spaces using implicitly defined semantics. Finally, we combine the distances and compare the distance between the query and the available documents in various spaces to rank the documents. In this procedure, we propose to model the document by hyperplane. To evaluate our proposed method, we conducted spoken document retrieval experiments using the NTCIR-9 SpokenDoc data set. The results showed that using the combination of the distances, and using LSI on the syllable-based document space, improved retrieval performance.

Keywords :

document handling; indexing; information retrieval; speech recognition; LSI; NTCIR-9 SpokenDoc data set; continuous syllable speech recognition; continuous word speech recognition; hyperplane; latent semantic indexing; multiple document vector spaces; query expansion; spoken document retrieval performance; syllable based document spaces; vector space models; word based document spaces; Indexes; Large scale integration; Semantics; Speech; Speech recognition; Vectors; Web pages;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific

Conference_Location :

Kaohsiung

Type :

conf

DOI :

10.1109/APSIPA.2013.6694119

Filename :

6694119

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=661258