مرکز منطقه ای اطلاع رساني علوم و فناوري - Morphological analysis of the corpus of spontaneous Japanese

DocumentCode :

1010072

Title :

Morphological analysis of the corpus of spontaneous Japanese

Author :

Uchimoto, Kiyotaka ; Takaoka, Kazuma ; Nobata, Chikashi ; Yamada, Atsushi ; Sekine, Satoshi ; Isahara, Hitoshi

Author_Institution :

Nat. Inst. of Inf. & Commun. Technol., Kyoto, Japan

Volume :

Issue :

fYear :

2004

fDate :

7/1/2004 12:00:00 AM

Firstpage :

382

Lastpage :

390

Abstract :

This paper describes two methods for detecting word segments and their morphological information in a Japanese spontaneous speech corpus, and describes how to tag a large spontaneous speech corpus accurately by using the two methods. The first method is used to detect any type of word segments. The second method is used when there are several definitions for word segments and their POS categories, and when one type of word segments includes another type of word segments. In this paper, we show that by using semi-automatic analysis, we achieve a precision of better than 99% for detecting and tagging short-unit words and 97% for long-unit words; the two types of words that comprise the corpus. We also show that better accuracy is achieved by using both methods than by using only the first.

Keywords :

maximum entropy methods; natural languages; speech processing; Japanese spontaneous speech corpus; long-unit words; morphological analysis; part-of-speech categories; semiautomatic analysis; short-unit words; word segments; Dictionaries; Entropy; Helium; Humans; Information analysis; Natural language processing; Public speaking; Speech analysis; Speech processing; Tagging;

fLanguage :

English

Journal_Title :

Speech and Audio Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1063-6676

Type :

jour

DOI :

10.1109/TSA.2004.828700

Filename :

1306511

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1010072