DocumentCode :
2590460
Title :
Cleansing Noisy City Names in Spatial Data Mining
Author :
Lim, SeungJin
Author_Institution :
Integrated Sci. & Technol., Marshall Univ., Huntington, WV, USA
fYear :
2010
fDate :
21-23 April 2010
Firstpage :
1
Lastpage :
8
Abstract :
One of the biggest adversaries to data mining from a large data warehouse is poor data quality. It is because most data mining algorithms have been designed based on the assumption that the data is clean and meaningful. Hence, poor data quality may lead to completely unexpected results. In this paper, an automatic city name correction algorithm is proposed to cleanse a large spatial database without requiring human intervention or a prior knowledge of the context. The algorithm achieves a precision of 96.6% which is significantly better than the 86.6% of the traditional Levenshtein distance and the 92% of the Longest Common Subsequence algorithm.
Keywords :
data mining; data warehouses; string matching; visual databases; automatic city name correction algorithm; cleansing noisy city names; data quality; large data warehouse; spatial data mining; spatial database; string matching; Algorithm design and analysis; Cities and towns; Data mining; Data warehouses; Error correction; Fires; Humans; Information systems; Missiles; Spatial databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Science and Applications (ICISA), 2010 International Conference on
Conference_Location :
Seoul
Print_ISBN :
978-1-4244-5941-4
Electronic_ISBN :
978-1-4244-5943-8
Type :
conf
DOI :
10.1109/ICISA.2010.5480390
Filename :
5480390
Link To Document :
بازگشت