DocumentCode :
3431925
Title :
Improve VSM text classification by title vector based document representation method
Author :
Tian Xia ; Yi Du
Author_Institution :
Dept. of Comput. & Inf., Shanghai Second Polytech. Univ., Shanghai, China
fYear :
2011
fDate :
3-5 Aug. 2011
Firstpage :
210
Lastpage :
213
Abstract :
Text Classification is a daunting task because it is difficult to extract the semantics of natural language texts. Many problems must be resolved before natural-language processing techniques can be effectively applied to a large collection of texts. A significant one is to extract semantic information from corpus in plan text. In Vector Space Model, a document is conceptually represented by a vector of terms extracted from each document, with associated weights representing the importance of each term in the document and within the whole document collection. Likewise, an unclassified document is also modeled as a list of terms with associated weights representing the importance of the terms in it. Many techniques introduces much statistical information of terms to represent their semantic information. However, as always, document title is not taken into special consideration, while it obviously contains much semantic information. This paper proposes Title Vector to address this issue.
Keywords :
classification; natural language processing; text analysis; vectors; VSM text classification; document representation method; natural-language processing; semantic information; statistical information; title vector; vector space model; Indexes; Semantics; Support vector machine classification; Testing; Text categorization; Training; Vectors; Text Classification; Title Vector; VSM; Vector Space Model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science & Education (ICCSE), 2011 6th International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-1-4244-9717-1
Type :
conf
DOI :
10.1109/ICCSE.2011.6028619
Filename :
6028619
Link To Document :
بازگشت