DocumentCode :
2746336
Title :
Detection of fraudulent financial reports with machine learning techniques
Author :
Seemakurthi, Prasad ; Shuhao Zhang ; Yibing Qi
fYear :
2015
fDate :
24-24 April 2015
Firstpage :
358
Lastpage :
361
Abstract :
This paper describes our efforts to apply various advanced supervised machine learning and natural language processing techniques, including Binomial Logistic Regression, Support Vector Machines, Neural Networks, Ensemble Techniques, and Latent Dirichlet Allocation (LDA), to the problem of detecting fraud in financial reporting documents available from the United States´ Security and Exchange Commission EDGAR database. Specifically, we apply LDA to a collection of type 10-K financial reports and to generate document-topic frequency matrix, and then submit these data to a series of advanced classification algorithms. We then apply evaluation metrics, such as Precision, Receiver Operating Characteristic Curve, and Area Under the Curve to evaluate the performance of each algorithm. We conclude that these methods show promise and suggest applying the approach to a larger set of input documents.
Keywords :
document handling; financial data processing; fraud; learning (artificial intelligence); matrix algebra; natural language processing; neural nets; pattern classification; regression analysis; security of data; support vector machines; EDGAR database; LDA; Security and Exchange Commission; United States; area under the curve; binomial logistic regression; classification algorithms; document-topic frequency matrix; ensemble techniques; evaluation metrics; financial reporting documents; fraudulent financial reports detection; latent Dirichlet allocation; natural language processing techniques; neural networks; precision; receiver operating characteristic curve; supervised machine learning techniques; support vector machines; Accuracy; Classification algorithms; Correlation; Logistics; Natural language processing; Neural networks; Support vector machines; Ensemble; Financial Fraud Detection; Latent Dirichlet Allocation; Machine Learning; Natural Language Processing; Support Vector Machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems and Information Engineering Design Symposium (SIEDS), 2015
Conference_Location :
Charlottesville, VA
Print_ISBN :
978-1-4799-1831-7
Type :
conf
DOI :
10.1109/SIEDS.2015.7117005
Filename :
7117005
Link To Document :
بازگشت