Title :
Dimension reduction techniques for accessing Chinese readability
Author :
Yaw-Huei Chen ; Ting-Chia Lin
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Chiayi Univ., Chiayi, Taiwan
Abstract :
Machine learning-based techniques have been used to assess document readability in recent studies. One of the important issues of machine learning-based text classification techniques is to reduce the dimension of the document vectors. Different feature selection and feature extraction methods such as mutual information, chi-square test, information gain, PCA, and LSA are compared for assessing Chinese readability. We also compare classification techniques SVM and LDA. The experimental results indicate that the combination of chi-square feature selection method and SVM performs well.
Keywords :
feature extraction; feature selection; learning (artificial intelligence); natural language processing; pattern classification; principal component analysis; support vector machines; text analysis; Chinese readability; LDA; LSA; PCA; SVM; chi-square feature selection method; chi-square test; dimension reduction techniques; document readability assessment; document vectors; feature extraction; information gain; machine learning-based techniques; mutual information; text classification techniques; Abstracts; Feature extraction; Principal component analysis; Support vector machines; Chinese readability; Classification; Feature extraction; Feature selection; Machine learning;
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2014 International Conference on
Conference_Location :
Lanzhou
Print_ISBN :
978-1-4799-4216-9
DOI :
10.1109/ICMLC.2014.7009154