Title :
Subject classification in the Oxford English Dictionary
Author :
Langari, Zarrin ; Tompa, Frank Wm
Author_Institution :
Dept. of Comput. Sci., Waterloo Univ., Ont., Canada
Abstract :
The Oxford English Dictionary is a valuable source of lexical information and a rich testing ground for mining highly structured text. Each entry is organized into a hierarchy of senses, which include definitions, labels and cited quotations. Subject labels distinguish the subject classification of a sense, for example they signal how a word may be used in anthropology, music or computing. Unfortunately subject labeling in the dictionary is incomplete. To overcome this incompleteness, we attempt to classify the senses (i.e., definitions) in the dictionary by their subjects, using the citations as an information guide. We report on four different approaches: k nearest neighbors, a standard classification technique; term weighting, an information retrieval method dealing with text; naive Bayes, a probabilistic method; and expectation maximization, an iterative probabilistic method. Experimental performance of these methods is compared based on standard classification metrics
Keywords :
Bayes methods; classification; data mining; dictionaries; Oxford English Dictionary; cited quotations; classification metrics; definitions; expectation maximization; highly structured text mining; information guide; information retrieval method; iterative probabilistic method; k nearest neighbors; labels; lexical information; naive Bayes; probabilistic method; senses; subject classification; subject labels; term weighting; Computer science; Data mining; Dictionaries; Information retrieval; Iterative methods; Labeling; Multiple signal classification; Nearest neighbor searches; Speech; Testing;
Conference_Titel :
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
0-7695-1119-8
DOI :
10.1109/ICDM.2001.989536