DocumentCode :
265021
Title :
Text Simplification Tools: Using Machine Learning to Discover Features that Identify Difficult Text
Author :
Kauchak, David ; Mouradi, Obay ; Pentoney, Christopher ; Leroy, Gondy
Author_Institution :
Middlebury Coll., Middlebury, VT, USA
fYear :
2014
fDate :
6-9 Jan. 2014
Firstpage :
2616
Lastpage :
2625
Abstract :
Although providing understandable information is a critical component in healthcare, few tools exist to help clinicians identify difficult sections in text. We systematically examine sixteen features for predicting the difficulty of health texts using six different machine learning algorithms. Three represent new features not previously examined: medical concept density, specificity (calculated using word-level depth in MeSH); and ambiguity (calculated using the number of UMLS Metathesaurus concepts associated with a word). We examine these features for a binary prediction task on 118,000 simple and difficult sentences from a sentence-aligned corpus. Using all features, random forests is the most accurate with 84% accuracy. Model analysis of the six models and a complementary ablation study shows that the specificity and ambiguity features are the strongest predictors (24% combined impact on accuracy). Notably, a training size study showed that even with a 1% sample (1,062 sentences) an accuracy of 80% can be achieved.
Keywords :
health care; learning (artificial intelligence); medical computing; natural language processing; text analysis; MeSH; UMLS metathesaurus concepts; binary prediction task; health texts; healthcare; machine learning; medical concept ambiguity; medical concept density; medical concept specificity; random forests; sentence-aligned corpus; text identification; text simplification tools; word-level depth; Electronic publishing; Encyclopedias; Feature extraction; Internet; Readability metrics; Unified modeling language; machine learning; text readability; text simplification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
System Sciences (HICSS), 2014 47th Hawaii International Conference on
Conference_Location :
Waikoloa, HI
Type :
conf
DOI :
10.1109/HICSS.2014.330
Filename :
6758930
Link To Document :
بازگشت