DocumentCode :
2387951
Title :
Extraction and classification of unstructured data in WebPages for structured multimedia database via XML
Author :
Abidin, Siti Z Z ; Idris, Noorazida Mohd ; Husain, Azizul H.
Author_Institution :
Fac. of Comput. & Math. Sci., Univ. Teknol. MARA, Shah Alam, Malaysia
fYear :
2010
fDate :
17-18 March 2010
Firstpage :
44
Lastpage :
49
Abstract :
Nowadays, there is a vast amount of information available in the internet. The useful data must be captured and stored for future purposes. One of the major unsolved problems in the information technology (IT) industry is the management of unstructured data. The unstructured data such as multimedia files, documents, spreadsheets, news, emails, memorandums, reports and web pages are difficult to capture and store in the common database storage. The underlying reason is due to the tools and techniques that proved to be so successful in transforming structured data into business intelligence and actionable information, simply do not work when it comes to unstructured data. As a result, new approaches are necessary. Attempts have been undertaken by several researchers to deal with unstructured data, but, so far it is hard to find a tool that can store and retrieve the extracted and classified unstructured data into a structured database system. This paper is to present our research on unstructured data identification, extraction and classification of web pages, which is then transformed into structured format in Extensible Markup Language (XML) document, and later stored into a multimedia database. The contribution of this research is in the approach of capturing the unstructured data and the efficiency of a multimedia database to handle this kind of data. The stored data could give benefits to various communities such as students, lecturers, researchers and IT managers because it can be used for any planning, decision-making, day-today operations, and other future purposes.
Keywords :
Internet; Web sites; XML; competitive intelligence; database management systems; decision making; information storage; multimedia databases; planning (artificial intelligence); query processing; XML; actionable information; business intelligence; common database storage; data capturing; day-to-day operations; decision-making; extensible markup language; information technology industry; internet information; planning; structured database system; structured multimedia database; unstructured data extraction; unstructured data identification; unstructured data management; webpages; Data mining; Database systems; Information retrieval; Information technology; Intelligent structures; Internet; Multimedia databases; Technology management; Web pages; XML; Data classification; Data extraction; Multimedia database; Unstructured data; Webpage; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Retrieval & Knowledge Management, (CAMP), 2010 International Conference on
Conference_Location :
Shah Alam, Selangor
Print_ISBN :
978-1-4244-5650-5
Type :
conf
DOI :
10.1109/INFRKM.2010.5466948
Filename :
5466948
Link To Document :
بازگشت