Title :
Automatic and Adaptive Clusters for Information Extraction
Author :
Charulatha, B.S. ; Rodrigues, Paul ; Chitralekha, T.
Author_Institution :
JNTUK, Kakinada, India
Abstract :
The web pages are heterogeneous and unstructured. The heterogeneity is due to the hybrid nature of the documents. The unstructureness is due to either multilingual or multimedia content in the web page. The mining should be independent of the language and software. The objective is when any data or content mining is done on a set of data is chosen to form the basis as done with keywords. If the base data is chosen arbitrarily, it is automatic, whereas some ´knowledge´ or ´background´ is put in the choice it is adaptive. Statistical features of the images are extracted from the pixel map of the image. The extracted features are presented to clustering algorithms, Fuzzy C Means and Subtractive clustering algorithm. The algorithm classifies the given image as a text or image representation. The accuracy of classification is compared and presented.
Keywords :
Internet; data mining; feature extraction; fuzzy set theory; image classification; image representation; pattern clustering; statistical analysis; Web pages; adaptive clusters; automatic clusters; content mining; data mining; fuzzy C means algorithm; image classification; image pixel map; image representation; image statistical feature extraction; information extraction; multilingual content; multimedia content; subtractive clustering algorithm; Accuracy; Classification algorithms; Clustering algorithms; Data mining; Feature extraction; Image representation; Web pages; Fuzzy c means; clustering; heterogeneous; multimedia; statistical features; subtractive clustering accuracy; unstructured;
Conference_Titel :
Soft Computing and Machine Intelligence (ISCMI), 2014 International Conference on
DOI :
10.1109/ISCMI.2014.29