Robust broad-scale benthic habitat mapping when training data is scarce

Author

Ahsan, Nasir ; Williams, Stefan B. ; Pizarro, Oscar

Author_Institution

Australian Center for Field Robot., Univ. of Sydney, Sydney, NSW, Australia

fYear

2012

fDate

21-24 May 2012

Firstpage

1

Lastpage

10

Abstract

Understanding the distribution of habitat classes at broad-scales is of interest in marine park conservation and planning. Typically sites of interest can extend up to many hundreds of square kilometers. However, collecting ground truth data (optical imagery, towed video, grab samples, and etc.) over such broad scales is impractical, and only a small fraction of the sites can be sampled depending on budget constraints. Benthic habitat mapping involves learning the correlations between habitat classes derived from limited ground truth sampling of the seabed and its corresponding morphology and extrapolating these correlations to the entire site. One important issue with such approaches is that the correlations are learned on limited data, therefore, motivating the need to investigate robust techniques for learning the correlations and extrapolating them. In this paper we have motivated the use of the generative classifier Gaussian Mixture Models (GMM´s) for the task of benthic habitat mapping instead of discriminative models such as Classification Trees (CT´s - popular in the benthic habitat mapping literature) and Support Vector Machines (SVM´s - generally popular in a variety of fields) based on the idea that generative classifiers take into more information about the underlying data distribution than discriminative classifiers, yielding more robust extrapolations. Using holdout validation we have shown that GMM´s consistently perform comparably, or outperform, the best classifier for all training set sizes (small and large), and that this is not the case with CT´s and SVM´s. We also show that GMM´s are more certain about their predictions over the broad-scale than the other classifiers.

Keywords

environmental factors; environmental science computing; geophysics computing; learning (artificial intelligence); oceanography; pattern classification; GMM classifier; Gaussian mixture models; correlation extrapolation; correlation learning; generative classifiers; ground truth data; habitat class correlations; habitat class distribution; marine park conservation; marine park planning; robust broad scale benthic habitat mapping; seabed ground truth sampling; seabed morphology; training data; Biological system modeling; Correlation; Data models; Entropy; Support vector machines; Training; Training data;

fLanguage

English

Publisher

ieee

Conference_Titel

OCEANS, 2012 - Yeosu

Conference_Location

Yeosu

Print_ISBN

978-1-4577-2089-5

Type

conf

DOI

10.1109/OCEANS-Yeosu.2012.6263540

Filename

6263540