DocumentCode :
3301180
Title :
Analysis of the degree of importance of information using newspapers and questionnaires
Author :
Murata, Masayuki ; Kanamaru, T. ; Nishimura, Ryota ; Torisawa, K. ; Doi, Kohei
Author_Institution :
NICT, Seika
fYear :
2008
fDate :
19-22 Oct. 2008
Firstpage :
1
Lastpage :
8
Abstract :
Our objective is to estimate and clarify the factors that determine the degree of importance of information by extracting the words that characterize the degree of importance and to construct a system for automatically estimating this degree of importance. We studied the degree of importance of information by using machine learning. We first performed experiments using newspaper documents (Dn). In this experiment, we assumed that a document on the front page or at the top of the front page is important. We were able to identify important documents with a precision of 0.9 by using machine learning. We found that in the case of a newspaper, the degree of importance can be estimated with high precision. Next, to estimate the degree of importance that people attach to a document, we conducted experiments using questionnaire data (Dq) as test data. In these experiments, the subjects were asked to identify which document from a pair was more important, and a high accuracy of 94% was obtained with more than 80% of them responding with the same answer. Furthermore, on using newspaper documents (Dn) as training data, we could obtain (i) the same accuracy by using Dn only instead of using Dn with Dq and (ii) a higher accuracy on using Dn and Dq instead of using Dq only. This observation is useful because preparing questionnaire data (Dq) can be an expensive process, whereas (Dn) is free. Finally, we extracted the characteristic words that differentiated important information from less important information by calculating the parameters of the features in machine learning (maximum entropy (ME) method).
Keywords :
information analysis; learning (artificial intelligence); maximum entropy methods; degree of importance; machine learning; maximum entropy method; newspaper document; questionnaire data; Data mining; Entropy; Information analysis; Kernel; Linearity; Machine learning; Testing; Training data; Web pages; Degree of importance of information; analysis; machine learning; newspaper; questionnaire;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2008. NLP-KE '08. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-4515-8
Electronic_ISBN :
978-1-4244-2780-2
Type :
conf
DOI :
10.1109/NLPKE.2008.4906797
Filename :
4906797
Link To Document :
بازگشت