Author :
Bursa, M. ; Lhotska, L. ; Chudacek, V. ; Huptych, M. ; Spilka, J. ; Janku, P. ; Huser, M.
Author_Institution :
Dept. of Cybern., FEE CTU in Prague, Prague, Czech Republic
Abstract :
In this work we have studied, evaluated and proposed different swarm intelligence techniques for mining information from loosely structured medical textual records with no apriori knowledge. In the paper we depict the process of mining a large dataset of ~50,000-120,000 records × 20 attributes in database tables, originating from the hospital information system (thanks go to the University Hospital in Brno, Czech Republic) recording over 10 years. This paper concerns only textual attributes with free text input, that means 613,000 text fields in 16 attributes. Each attribute item contains ~800-1,500 characters (diagnoses, medications, etc.). The output of this task is a set of ordered/nominal attributes suitable for rule discovery mining and automated processing. Information mining from textual data becomes a very challenging task when the structure of the text record is very loose without any rules. The task becomes even more difficult when natural language is used and no apriori knowledge is available. The medical environment itself is also very specific: the natural language used in textual description varies with the personality creating the record (there are many personalized approaches), however it is restricted by terminology (i.e. medical terms, medical standards, etc.). Moreover, the typical patient record is filled with typographical errors, duplicates, ambiguities and many (nonstandard) abbreviations. Nature inspired methods have their origin in real nature processes and play an important role in the domain of artificial intelligence. They offer fast and robust solutions to many problems, although they belong to the branch of approximative methods. The high number of individuals and the decentralized approach to task coordination in the social species revealed a high degree of parallelism, self-organization and fault tolerance. In studying these paradigms, we have high chance to discover inspiration concepts for many successful metaheuristics. Fir- - st, classical approaches such as basic statistic approaches, word (and word sequence) frequency analysis, etc., have been used to simplify the textual data and provide a preliminary overview of the data. Finally, an ant-inspired self-organizing approach has been used to automatically provide a simplified dominant structure, presenting structure of the records in the human readable form that can be further utilized in the mining process as it describes the vast majority of the records. Note that this project is an ongoing process (and research) and new data are irregularly received from the medical facility, justifying the need for robust and fool-proof algorithms.
Keywords :
ant colony optimisation; data mining; fault tolerance; hospitals; information retrieval; medical information systems; natural language processing; particle swarm optimisation; records management; ant inspired techniques; ant-inspired self-organizing approach; artificial intelligence; decentralized approach; fault tolerance; fool-proof algorithms; hospital information system; information mining; natural language processing; nature inspired methods; patient record processing; rule discovery mining; swarm intelligence techniques; textual data mining; textual information retrieval; Clustering algorithms; Computers; Data mining; Heuristic algorithms; Hospitals; Insects; Medical diagnostic imaging; Ant Colony; Hospital Information SystemS; Hospital Information Systemwarm Intelligence; Medical Record Processing; Textual Data Mining; warm Intelligence;