Title :
Unsupervised extraction of product information from semi-structured sources
Author_Institution :
AGT Group (R&D) GmbH, Darmstadt, Germany
Abstract :
Product information search has become one of the most important application areas of the Web. Especially considering pricey technical products, consumers tend to carry out intensive research activities previous to an actual acquisition. However, the vast amount of available data about such products and its various representations may easily overstrain potential customers. In this paper, we develop a comprehensive technique for extracting product specifications about arbitrary technical products from web pages in a widely unsupervised manner. The technique is based on a clustering approach that uses structural and visual features of web page elements. The resulting detailed information sets allow a potential consumer to effectively compare products while saving the manual extraction work.
Keywords :
Internet; data acquisition; data structures; marketing data processing; pattern clustering; unsupervised learning; Web page elements; clustering approach-based technique; information sets; manual extraction work; pricey technical products; product information search; product specification extraction; semistructured sources; structural features; unsupervised product information extraction; visual features;
Conference_Titel :
Computational Intelligence and Informatics (CINTI), 2012 IEEE 13th International Symposium on
Conference_Location :
Budapest
Print_ISBN :
978-1-4673-5205-5
Electronic_ISBN :
978-1-4673-5210-9
DOI :
10.1109/CINTI.2012.6496770