DocumentCode
2334175
Title
Better rules, fewer features: a semantic approach to selecting features from text
Author
Blake, Catherine ; Pratt, Wanda
Author_Institution
Dept. of Inf. & Comput. Sci., California Univ., Irvine, CA, USA
fYear
2001
fDate
2001
Firstpage
59
Lastpage
66
Abstract
The choice of features used to represent a domain has a profound effect on the quality of the model produced; yet, few researchers have investigated the relationship between the features used to represent text and the quality of the final model. We explored this relationship for medical texts by comparing association rules based on features with three different semantic levels: (1) words (2) manually assigned keywords and (3) automatically selected medical concepts. Our preliminary findings indicate that bi-directional association rules based on concepts or keywords are more plausible and more useful than those based on word features. The concept and keyword representations also required 90% fewer features than the word representation. This drastic dimensionality reduction suggests that this approach is well suited to large textual corpora of medical text, such as parts of the Web
Keywords
bibliographic systems; computational linguistics; data mining; medical information systems; text analysis; Web; association rules; automatically selected medical concepts; bi-directional association rules; dimensionality reduction; feature selection; keyword representations; large textual corpus; manually assigned keywords; medical texts; semantic approach; semantic levels; text representation; word features; word representation; words; Association rules; Bidirectional control; Breast cancer; Breast neoplasms; Computer science; Data mining; Diseases; Medical treatment; Natural languages; Predictive models;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location
San Jose, CA
Print_ISBN
0-7695-1119-8
Type
conf
DOI
10.1109/ICDM.2001.989501
Filename
989501
Link To Document