Title :
SSIE: An Automatic Data Extractor for Sports Management in Athletics Modality
Author :
Simões;Fabio Matsunaga;Armando Toda;Jacques Brancher;Abdallah Junior;Rosangela Busto
Author_Institution :
Comput. Sci. Dept., Londrina State Univ., Londrina, Brazil
Abstract :
Sports management concerns the organization of sport results and modalities information and statistical analysis by professionals. However, these information scattered around the web or organized by sport events which difficult the prospection of sport talents and the textual information are unstructured or semi-structured. This work proposes a Summary Sport Information Extraction System (SSIE) to generate a summary of statistics of the athletics modality by the automatic information extraction of documents retrieved from web. These documents are converted in textual information and classified using Naive Bayes learning method, according to sport type. After the documents retrieval and classification, text segmentation/tokenization, corpus annotation and entity/subset recognition by chunking were used to generate data frames in parse trees structure. The parse trees information are stored in a database, from which was possible to summary projection and big data analyzing over the web. The main contribution of this work was the clustering of huge amount of data spread on the web, useful for sports management.
Keywords :
"Data mining","Portable document format","Feature extraction","Web pages","Knowledge based systems","Learning systems"
Conference_Titel :
Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM), 2015 IEEE International Conference on
DOI :
10.1109/CIT/IUCC/DASC/PICOM.2015.23