DocumentCode :
120709
Title :
An analysis into using unstructured non-expert text in the illicit drug domain
Author :
Carter, Bill ; Hofmann, Martin
Author_Institution :
Garda Siochana Anal. Service, An Garda Siochana, Dublin, Ireland
fYear :
2014
fDate :
21-22 Feb. 2014
Firstpage :
651
Lastpage :
657
Abstract :
The Pillreports.com database was mined in order to determine if the free-text fields in the database could be of use in differentiating regular pills from those that have been adulterated, i.e. contains ingredients not comparable to MDMA. The data was download and extracted using RapidMiner and Xpath queries. A Naive Bayes and SVM binary classification model was created. Pre-processing techniques of tokenisation, n-gram creation, stop-word removal, stemming as well as feature selection by weights were applied to the data, resulting in a 15 point improvement in the model. In addition we are reporting on a comprehensive cluster analysis. Frequent terms and differences between clusters were visualised using word clouds. Clusters were compared with values contained in nominal fields. Model results and interpretation are provided at various preprocessing stages. Key phrase extraction is identified as an area of possible future work.
Keywords :
Bayes methods; database management systems; drugs; feature selection; natural language processing; pattern classification; pattern clustering; query processing; support vector machines; text analysis; word processing; MDMA; Naive Bayes classification model; RapidMiner queries; SVM binary classification model; Xpath queries; cluster analysis; database mining; feature selection; free-text fields; illicit drug domain; key phrase extraction; n-gram creation; pill differentiation; preprocessing techniques; stop-word removal; tokenisation; unstructured nonexpert text analysis; word clouds; Accuracy; Computational modeling; Data mining; Databases; Support vector machines; Tag clouds; Vectors; Text analysis; classification; web content mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advance Computing Conference (IACC), 2014 IEEE International
Conference_Location :
Gurgaon
Print_ISBN :
978-1-4799-2571-1
Type :
conf
DOI :
10.1109/IAdCC.2014.6779401
Filename :
6779401
Link To Document :
بازگشت