• DocumentCode
    2633982
  • Title

    Identifying biological terms from text by support vector machine

  • Author

    Ju, Zhenfei ; Zhou, Meichen ; Zhu, Fei

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China
  • fYear
    2011
  • fDate
    21-23 June 2011
  • Firstpage
    455
  • Lastpage
    458
  • Abstract
    In contemporary society, an increasing number of people are involved in the biomedical research. However there is still a large amount of biological knowledge in the various unstructured documents so that it is difficult to analyze biological data. How to identify biological terms effectively from text is one of the important problems in the area of bioinformatics. Nowadays the precision of the best biological terms identification system has reached more than 80%, but is lower than the one of general system. Here we aim to recognize the name of the specified type from biological data set. We choose support vector machine (SVM) to do the work. With the help of GENIA corpus which is a collection of Medline abstracts, we get the precision rate= 84% and recall rate=81% in total for the two categories classification problem. When meeting the multiple categories classification problem, SVM can identify biological terms accurately, but the recall rate is very low. The increasing number of test data will not result in a decrease of precision, and the recall rate will increase.
  • Keywords
    medical administrative data processing; support vector machines; text analysis; GENIA corpus; Medline abstracts; bioinformatics; biological term identification system; categories classification problem; support vector machine; Biology; Conferences; Data mining; Hidden Markov models; Support vector machines; Testing; Training; Biological Data Mining; Biological Terms Identification; Machine Learning; Support Vector Machine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Industrial Electronics and Applications (ICIEA), 2011 6th IEEE Conference on
  • Conference_Location
    Beijing
  • ISSN
    pending
  • Print_ISBN
    978-1-4244-8754-7
  • Electronic_ISBN
    pending
  • Type

    conf

  • DOI
    10.1109/ICIEA.2011.5975627
  • Filename
    5975627