DocumentCode :
3101218
Title :
Textmining, feature selection and datamining for proteins classification
Author :
Mhamdi, Faouzi ; Elloumi, Mourad ; Rakotomalala, Ricco
Author_Institution :
Departement d´´Informatique Faculte des Sci. de Tunis, URPAH, Tunisia
fYear :
2004
fDate :
19-23 April 2004
Firstpage :
457
Lastpage :
458
Abstract :
The present study presents the classification of proteins by basing on its primary structures. The sequence of proteins collected in a file. The application of textmining technique for extracting the features is proposed. An algorithm is also developed which extracts all the n-grams existing in the file of data and produced a learning file. Algorithm supplies three files, Boolean file, that is a relation of existence or not existence, frequencies files and occurrences files. The applied forward selection and backward elimination method is a learning file with an accepted features numbers.
Keywords :
data mining; feature extraction; medical information systems; pattern classification; proteins; sequences; Boolean file; backward elimination method; datamining; feature selection; forward selection; frequencies-occurrences file; learning file; proteins classification; sequence; textmining; Classification tree analysis; Data mining; Decision trees; Feature extraction; Frequency; Proteins; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Communication Technologies: From Theory to Applications, 2004. Proceedings. 2004 International Conference on
Print_ISBN :
0-7803-8482-2
Type :
conf
DOI :
10.1109/ICTTA.2004.1307829
Filename :
1307829
Link To Document :
بازگشت