DocumentCode :
660642
Title :
Information Extraction for Computer Science Academic Rankings System
Author :
Chengkai Shi ; Jiahui Quan ; Minglu Li
Author_Institution :
Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
fYear :
2013
fDate :
4-6 Nov. 2013
Firstpage :
69
Lastpage :
76
Abstract :
Today the academic ranking for computer science is a hot and importmant problem. This paper introduces Computer Science Academic Rankings System (CSAR) which aims at academic information extracting, mining and ranking. In this paper we mainly present approaches for information extraction and normalization in CSAR. For semi-structured and unstructured web pages such as paper-view pages, we propose a method with natural language processing n-gram model and web grammar. We generate an optimal matching bipartite graph to extract authors and organizations information with maximum likelihood. CSAR also uses KM algorithm and Hungarian algorithm to find authors and emails correspondence. For information normalization, we introduce n-gram model, EM algorithm and trigram model with linear interpolation to construct part-of-speech tagger, with which to extract useful information from web source. Then TF-IDF model and string edit distance are applied to finish normalizing organization names. In experiment, our proposed approaches obtain high accuracy rate and great improvements of academic information extraction.
Keywords :
computer science education; expectation-maximisation algorithm; graph theory; information retrieval; natural language processing; CSAR system; Hungarian algorithm; KM algorithm; TF-IDF model; Web grammar; Web pages; computer science academic rankings system; expectation-maximization algorithm; information extraction; information normalization; linear interpolation; maximum likelihood estimation; natural language processing n-gram model; optimal matching bipartite graph; paper-view pages; part-of-speech tagger; term frequency-inverse document frequency model; trigram model; Bipartite graph; Data mining; Electronic mail; Grammar; Organizations; Social network services; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud and Service Computing (CSC), 2013 International Conference on
Conference_Location :
Beijing
Type :
conf
DOI :
10.1109/CSC.2013.19
Filename :
6693181
Link To Document :
بازگشت