Title :
Enriched content mining for web applications
Author :
Dhivya, G. ; Deepika, K. ; Kavitha, J. ; Kumari, V. Nithya
Author_Institution :
Inf. Technol., Panimalar Eng. Coll., Chennai, India
Abstract :
In recent years, it has been witnessed that the ever-interesting and upcoming publishing medium is the World Wide Web. Much of the web content is unstructured so gathering and making sense of such data is very tedious. Web servers worldwide generate a vast amount of information on web users´ browsing activities. Several researchers have studied these so-called web access log data to better understand and characterize web users. Data can be enriched with information about the content of visited pages and the origin (e.g., geographic, organizational) of the requests. The goal of this project is to analyze user behavior by mining enriched web access log data. The several web usage mining methods for extracting useful features is discussed and employ all these techniques to cluster the users of the domain to study their behaviors comprehensively. The contributions of this thesis are a data enrichment that is content and origin based and a treelike visualization of frequent navigational sequences. This visualization allows for an easily interpretable tree-like view of patterns with highlighted relevant information. The results of this project can be applied on diverse purposes, including marketing, web content advising, (re-)structuring of web sites and several other E-business processes, like recommendation and advertiser systems. It also rank the best relevant documents based on Top K query for effective and efficient data retrieval system. It filters the web documents by providing the relevant content in the search engine result page (SERP).
Keywords :
Web sites; advertising data processing; data mining; data visualisation; document handling; electronic commerce; electronic publishing; information filtering; query processing; recommender systems; search engines; tree data structures; SERP; Web applications; Web content; Web content advising; Web document filtering; Web sites restructuring; World Wide Web; advertiser systems; best relevant document ranking; data gathering; data retrieval system; e-business processes; enriched Web access log data mining; enriched content mining; frequent navigational sequences; marketing; publishing medium; recommendation systems; search engine result page; top K query; treelike visualization; user behavior analysis; Algorithm design and analysis; Conferences; Data mining; Databases; Search engines; Web pages; Data-based approach; SERP; Top K-query; Web mining; World Wide Web;
Conference_Titel :
Innovations in Information, Embedded and Communication Systems (ICIIECS), 2015 International Conference on
Conference_Location :
Coimbatore
Print_ISBN :
978-1-4799-6817-6
DOI :
10.1109/ICIIECS.2015.7193094