DocumentCode :
2334892
Title :
Subject classification in the Oxford English Dictionary
Author :
Langari, Zarrin ; Tompa, Frank Wm
Author_Institution :
Dept. of Comput. Sci., Waterloo Univ., Ont., Canada
fYear :
2001
fDate :
2001
Firstpage :
329
Lastpage :
336
Abstract :
The Oxford English Dictionary is a valuable source of lexical information and a rich testing ground for mining highly structured text. Each entry is organized into a hierarchy of senses, which include definitions, labels and cited quotations. Subject labels distinguish the subject classification of a sense, for example they signal how a word may be used in anthropology, music or computing. Unfortunately subject labeling in the dictionary is incomplete. To overcome this incompleteness, we attempt to classify the senses (i.e., definitions) in the dictionary by their subjects, using the citations as an information guide. We report on four different approaches: k nearest neighbors, a standard classification technique; term weighting, an information retrieval method dealing with text; naive Bayes, a probabilistic method; and expectation maximization, an iterative probabilistic method. Experimental performance of these methods is compared based on standard classification metrics
Keywords :
Bayes methods; classification; data mining; dictionaries; Oxford English Dictionary; cited quotations; classification metrics; definitions; expectation maximization; highly structured text mining; information guide; information retrieval method; iterative probabilistic method; k nearest neighbors; labels; lexical information; naive Bayes; probabilistic method; senses; subject classification; subject labels; term weighting; Computer science; Data mining; Dictionaries; Information retrieval; Iterative methods; Labeling; Multiple signal classification; Nearest neighbor searches; Speech; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
0-7695-1119-8
Type :
conf
DOI :
10.1109/ICDM.2001.989536
Filename :
989536
Link To Document :
بازگشت