DocumentCode
120709
Title
An analysis into using unstructured non-expert text in the illicit drug domain
Author
Carter, Bill ; Hofmann, Martin
Author_Institution
Garda Siochana Anal. Service, An Garda Siochana, Dublin, Ireland
fYear
2014
fDate
21-22 Feb. 2014
Firstpage
651
Lastpage
657
Abstract
The Pillreports.com database was mined in order to determine if the free-text fields in the database could be of use in differentiating regular pills from those that have been adulterated, i.e. contains ingredients not comparable to MDMA. The data was download and extracted using RapidMiner and Xpath queries. A Naive Bayes and SVM binary classification model was created. Pre-processing techniques of tokenisation, n-gram creation, stop-word removal, stemming as well as feature selection by weights were applied to the data, resulting in a 15 point improvement in the model. In addition we are reporting on a comprehensive cluster analysis. Frequent terms and differences between clusters were visualised using word clouds. Clusters were compared with values contained in nominal fields. Model results and interpretation are provided at various preprocessing stages. Key phrase extraction is identified as an area of possible future work.
Keywords
Bayes methods; database management systems; drugs; feature selection; natural language processing; pattern classification; pattern clustering; query processing; support vector machines; text analysis; word processing; MDMA; Naive Bayes classification model; RapidMiner queries; SVM binary classification model; Xpath queries; cluster analysis; database mining; feature selection; free-text fields; illicit drug domain; key phrase extraction; n-gram creation; pill differentiation; preprocessing techniques; stop-word removal; tokenisation; unstructured nonexpert text analysis; word clouds; Accuracy; Computational modeling; Data mining; Databases; Support vector machines; Tag clouds; Vectors; Text analysis; classification; web content mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Advance Computing Conference (IACC), 2014 IEEE International
Conference_Location
Gurgaon
Print_ISBN
978-1-4799-2571-1
Type
conf
DOI
10.1109/IAdCC.2014.6779401
Filename
6779401
Link To Document