Title of article :
Unsupervised pattern recognition for the interpretation of ecological data
Author/Authors :
Walley، نويسنده , , William J and OʹConnor، نويسنده , , Mark A، نويسنده ,
Abstract :
The paper describes a novel pattern recognition system (MIR-max) that was developed to facilitate the construction of a river pollution diagnostic system for the British Environment Agency. MIR-max is a non-neural self-organising map based on information theory, which, unlike Kohonenʹs Self-organised map (SOM), separates the processes of clustering and ordering. It first clusters the input samples into a pre-defined number of classes by maximising the mutual information between the samples and the classes. The classes are then ordered in a two-dimensional output space by maximising the correlation coefficient (r) between the Euclidean distances separating the classes in data space and their corresponding distances in output space. This produces a map of the classes which when labelled can be used for the classification/diagnosis of new samples. A novel feature of MIR-max is that it permits the disaggregation of the classes in the output map, thus permitting exceptional classes to separate from their neighbours. MIR-max is designed specifically for use with ordinal data, but can also be used for interval-valued data. Its application in the ecological field is demonstrated via two examples based on data from the 1995 river quality survey of England and Wales. In the first example, MIR-max is used to classify biological samples into 100 river quality classes for each of five site types. These classifiers are then tested against two corresponding neural network classifiers, and are shown to provide better performance. In the second example, MIR-max is used to classify combined biological and environmental (i.e. physical characteristics of the site) data directly into 500 quality classes. The way in which this pattern classifier has been used to produce a river pollution diagnostic system is then explained. The advantages of the system over traditional river quality assessment systems, like RIVPACS, are outlined. It is concluded that MIR-max has considerable potential for use in the visualisation and interpretation of multivariate ecological data.
Keywords :
Pattern recognition , River quality , RIVPACS , Pollution , BIOLOGICAL MONITORING , diagnosis , Ecological data , Self-organising maps
Journal title :
Astroparticle Physics