DocumentCode
2747021
Title
Gene Name Automatic Recognition in Biomedical Literature
Author
Yang, Zhihao ; Lin, Hongfei ; Zhao, Jing
Author_Institution
Dept. of Comput. Sci. & Eng., Dalian Univ. of Technol.
Volume
2
fYear
0
fDate
0-0 0
Firstpage
9391
Lastpage
9395
Abstract
Identifying gene names in biomedical texts is regarded as a crucial step for text mining. Our approach is a combination of dictionary based approach and machine learning based approach. Based on a gene name dictionary, an edit distance approximate string searching algorithm was used to improve the recall rate of gene recognition which is greatly lowered due to a lack of standard gene-naming conventions. Then the naive Bayes and SVM classifiers were adopted to filter out false recognitions, therefore improving the precision rate of gene recognition. The experiments show that classifiers greatly improve precision with slight loss of recall, resulting in a much better F-score (from 53.7% to 67.6%)
Keywords
biology computing; classification; data mining; dictionaries; genetics; learning (artificial intelligence); string matching; text analysis; SVM classifier; biomedical literature; biomedical texts; edit distance; gene name automatic recognition; gene name dictionary; machine learning; naive Bayes classifier; string searching; text mining; Biomedical engineering; Computer science; Dictionaries; Electronic mail; Epidermis; Machine learning; Support vector machine classification; Support vector machines; Text mining; Text recognition; Edit Distance; Naive Bayes Classifier; SVM Classifier; Text Mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Control and Automation, 2006. WCICA 2006. The Sixth World Congress on
Conference_Location
Dalian
Print_ISBN
1-4244-0332-4
Type
conf
DOI
10.1109/WCICA.2006.1713819
Filename
1713819
Link To Document