DocumentCode
3431925
Title
Improve VSM text classification by title vector based document representation method
Author
Tian Xia ; Yi Du
Author_Institution
Dept. of Comput. & Inf., Shanghai Second Polytech. Univ., Shanghai, China
fYear
2011
fDate
3-5 Aug. 2011
Firstpage
210
Lastpage
213
Abstract
Text Classification is a daunting task because it is difficult to extract the semantics of natural language texts. Many problems must be resolved before natural-language processing techniques can be effectively applied to a large collection of texts. A significant one is to extract semantic information from corpus in plan text. In Vector Space Model, a document is conceptually represented by a vector of terms extracted from each document, with associated weights representing the importance of each term in the document and within the whole document collection. Likewise, an unclassified document is also modeled as a list of terms with associated weights representing the importance of the terms in it. Many techniques introduces much statistical information of terms to represent their semantic information. However, as always, document title is not taken into special consideration, while it obviously contains much semantic information. This paper proposes Title Vector to address this issue.
Keywords
classification; natural language processing; text analysis; vectors; VSM text classification; document representation method; natural-language processing; semantic information; statistical information; title vector; vector space model; Indexes; Semantics; Support vector machine classification; Testing; Text categorization; Training; Vectors; Text Classification; Title Vector; VSM; Vector Space Model;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science & Education (ICCSE), 2011 6th International Conference on
Conference_Location
Singapore
Print_ISBN
978-1-4244-9717-1
Type
conf
DOI
10.1109/ICCSE.2011.6028619
Filename
6028619
Link To Document