Title :
Text mining: Finding hot topics TF∗PDF vs. LSI
Author :
Katyayani, J. ; Sriharsha, A.V. ; Sudhir, B.
Author_Institution :
Sri Padmavathi Mahila Visva Vidyalayam, Tirupati, India
Abstract :
With the vast amount of digital text materials available on the Net, it is almost impractical for people to absorb all related information in a timely manner. This problem has been overcome by erstwhile researchers and scientists of data mining. The efficiency in the methods and exploratory analysis has to be ascertained yet. Document wise term frequencies and inverted frequencies are available to calculate the statistical importance among the documents. Determining the time line importance of the documents plays very essential role than just finding the document´s importance. LSI is a basic PCA approach, which is proposed with time-line approach and has been discussed comparatively in this paper.
Keywords :
data mining; database management systems; information retrieval; principal component analysis; text analysis; LSI; PCA approach; TF*PDF; digital text material; dimensionality reduction; document importance; inverted frequency; latent semantic indexing; statistical importance; text database; text mining; text retrieval indexing technique; time line importance; Conferences; Data acquisition; Data mining; Educational institutions; Event detection; Explosions; Frequency; Intelligent systems; Large scale integration; Text mining; IR; Text mining; dimensionality reduction; latent-semantic indexing;
Conference_Titel :
Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, 2009. IDAACS 2009. IEEE International Workshop on
Conference_Location :
Rende
Print_ISBN :
978-1-4244-4901-9
Electronic_ISBN :
978-1-4244-4882-1
DOI :
10.1109/IDAACS.2009.5342925