DocumentCode :
162705
Title :
A novel statistical and linguistic features based technique for keyword extraction
Author :
Gupta, Arpan ; Dixit, Abhishek ; Sharma, Arvind Kumar
Author_Institution :
Comput. Eng. Dept., YMCA Univ. of Sci. & Technol., Faridabad, India
fYear :
2014
fDate :
1-2 March 2014
Firstpage :
55
Lastpage :
59
Abstract :
WWW is a decentralized, distributed and heterogeneous information resource. With increased availability of information through WWW, it is very difficult to read all documents to retrieve the desired results; therefore there is a need of summarization methods which can help in providing contents of a given document in a precise manner. Keywords of a document may provide a compact representation of a document´s content. As a result various algorithms and systems intended to carry out automatic keywords extraction have been proposed in the recent past. However, the existing solutions require either training models or domain specific information for automatic keyword extraction. To cater to these shortcomings an innovative hybrid approach for automatic keyword extraction using statistical and linguistic features of a document has been proposed. This statistical and linguistic technique based keyword extraction works on an individual document without any prior parameter change and takes full advantage of all the features of the document to extract the keywords. The extracted keywords can than assist in domain specific indexing. The performance of the proposed method as compared to existing Keyword Extraction tools such as Dream web design etc. in terms of Precision and Recall are also presented in this paper.
Keywords :
Internet; computational linguistics; indexing; information resources; information retrieval; statistical analysis; World Wide Web; automatic keywords extraction; document retrieval; domain specific indexing; information resource; keyword extraction; linguistic features; statistical features; Data mining; Educational institutions; Feature extraction; Ontologies; Pragmatics; Web pages; Domain ontology; Extractor; Key-phrase; Linguistic technique; Statistical technique; Unsupervised-approach;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Systems and Computer Networks (ISCON), 2014 International Conference on
Conference_Location :
Mathura
Print_ISBN :
978-1-4799-2980-1
Type :
conf
DOI :
10.1109/ICISCON.2014.6965218
Filename :
6965218
Link To Document :
بازگشت