DocumentCode :
3396103
Title :
Harvesting reliability data from the internet
Author :
Dussault, H. ; Zarubin, Peter S. ; Morris, Seymour ; Nicholls, David
Author_Institution :
Dept. of Electr. Eng., SUNY Inst. of Technol., Utica, NY
fYear :
2008
fDate :
28-31 Jan. 2008
Firstpage :
322
Lastpage :
327
Abstract :
This paper describes the initial design, development and testing of a tool that harvests reliability data from multiple internet resources. An evaluation corpus of 1544 URLs is used to assess typical reliability data collection content and challenges and to provide a basis for evaluating data harvesting tool performance and capability growth. Early results show that the ability to handle portable document format (PDF) documents, correctly parse web pages, including significant punctuation marks and number formatting, and to extract data from tables are important in reliability data collection. The results to date show that reliability data is available on the internet, and that automated tools can begin to discover and harvest that information. However, there is much work to do to be able to reliably discover, extract, cluster, and present valid component reliability to users.
Keywords :
Internet; data handling; URL; data harvesting tool performance evaluation; internet; multiple internet resources; parse web pages; portable document format documents; reliability data collection content; reliability data harvesting; Data engineering; Data mining; Data visualization; Information analysis; Internet; Reliability engineering; Search engines; Service oriented architecture; Web mining; Web pages; reliability data; web mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Reliability and Maintainability Symposium, 2008. RAMS 2008. Annual
Conference_Location :
Las Vegas, NV
ISSN :
0149-144X
Print_ISBN :
978-1-4244-1460-4
Electronic_ISBN :
0149-144X
Type :
conf
DOI :
10.1109/RAMS.2008.4925816
Filename :
4925816
Link To Document :
بازگشت