Title :
Preprocessing of High Dimensional Dataset for Developing Expert IR System
Author :
Chaudhari, Anagha ; Phadatare, Pravin M. ; Kudale, Pranil S. ; Mohite, Raviraj B. ; Petare, Rohan P. ; Jagdale, Yogesh P. ; Mudiraj, Amitabh
Author_Institution :
Dept. of Inf. Technol., Pimpri Chinchwad Coll. of Eng., Pune, India
Abstract :
Now-a-days due to increase in the availability of computing facilities, large amount of data in electronic form is been generated. The data generated is to be analyzed in order to maximize the benefit of intelligent decision making. Text categorization is an important and extensively studied problem in machine learning. The basic phases in the text categorization include preprocessing features like removing stop words from documents and applying TF-IDF is used which results into increase efficiency and deletion of irrelevant data from huge dataset. This paper discusses the implication of Information Retrieval system for text-based data using different clustering approaches. Applying TF-IDF algorithm on dataset gives weight for each word which summarized by Weight matrix.
Keywords :
decision making; information retrieval systems; learning (artificial intelligence); text analysis; TF-IDF algorithm; clustering approaches; electronic form; expert IR system; high dimensional dataset processing; information retrieval system; intelligent decision making; machine learning; text categorization; text-based data; weight matrix; Clustering algorithms; Databases; Flowcharts; Frequency measurement; Information retrieval; Text categorization; Information retrieval; TF IDF; stopwords; text based clustering;
Conference_Titel :
Computing Communication Control and Automation (ICCUBEA), 2015 International Conference on
Conference_Location :
Pune
DOI :
10.1109/ICCUBEA.2015.87