DocumentCode :
7215
Title :
Automatic Quality Control of Transportation Reports Using Statistical Language Processing
Author :
Gerber, M.S. ; Lu Tang
Author_Institution :
Dept. of Syst. & Inf. Eng., Univ. of Virginia, Charlottesville, VA, USA
Volume :
14
Issue :
4
fYear :
2013
fDate :
Dec. 2013
Firstpage :
1681
Lastpage :
1689
Abstract :
The processes of developing, monitoring, and maintaining transportation systems produce large volumes of information. Human fieldworkers are often responsible for gathering this information, and despite their best efforts, they will inevitably introduce errors into the collected data. This is a critical problem since: 1) the collected data are used to justify key infrastructure maintenance and development decisions; and 2) the volume of unstructured information (e.g., plain text) makes manual quality control prohibitively expensive. We introduce a solution to this problem in the example domain of vehicle accident reports. First, we analyzed a sample of accident reports and confirmed the existence of many data entry errors. Second, we developed and evaluated a statistical language processing approach that automatically identifies reports containing data entry errors. We tested a variety of system configurations on real-world data and compared their performance with multiple baseline methods. The best configuration achieved a performance score of 84%, far outperforming the baseline methods. Our results and analyses have quality control implications for any data source that pairs structured text (e.g., coded fields) with unstructured text.
Keywords :
natural language processing; quality control; statistical analysis; traffic engineering computing; automatic quality control; data entry errors; data source; human fieldworkers; key infrastructure maintenance; multiple baseline methods; natural language processing; statistical language processing approach; structured text; transportation reports; transportation systems; unstructured information volume; unstructured text; vehicle accident reports; Accidents; Feature extraction; Natural language processing; Quality control; Natural language processing (NLP); quality control; transportation reports;
fLanguage :
English
Journal_Title :
Intelligent Transportation Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1524-9050
Type :
jour
DOI :
10.1109/TITS.2013.2265892
Filename :
6545322
Link To Document :
بازگشت