Development and Applications of Machine Learning Methods for Hyperspectral Data

Abstract

Hyperspectral remote sensing of the Earth relies on data from passive optical sensors that are mounted on platforms such as satellites and Unmanned Aerial Vehicles (UAVs). Hyperspectral data includes information to identify materials and to monitor environmental variables, such as soil texture, soil moisture, chlorophyll a, and land cover. Data analysis methods are necessary to retrieve information from hyperspectral data. One powerful tool in the analysis of hyperspectral data is Machine Learning (ML), a subset of Artificial Intelligence. ML models can solve nonlinear correlations and are scalable on increasing dataset sizes. Every dataset and every ML estimation task brings new challenges that require innovative solutions. The aim of the studies presented in this thesis is the development and applications of ML methods on hyperspectral remote sensing data. These studies address the following three main challenges: (I) datasets with only a few labeled datapoints, (II) the limited potential of shallow ML approaches on hyperspectral data, and (III) the challenge of dataset shift between training and test dataset. The studies on the challenge (I) result in the development and publication of a Self-Organizing Map (SOM) framework for unsupervised, supervised, and semi-supervised learning. The SOM is applied to a hyperspectral dataset in the (semi-)supervised regression of soil moisture, outperforming a Random Forest (RF) regressor. The SOM framework shows adequate performance in the (semi-)supervised classification of land cover. It provides additional visualization capabilities to improve the understanding of the underlying dataset. In the studies addressing the challenge (II), three innovative 1-dimensional Convolutional Neural Network (CNN) architectures are developed. The CNNs are applied in the context of a soil texture classification to a freely available hyperspectral dataset. Their performance is compared with two existing CNN approaches and a RF classifier. Two main findings can be summarized. Firstly, the CNN approaches show significantly better performance than the applied shallow approach RF. Secondly, adding the information about hyperspectral band numbers to the input layer of a CNN improves the performance on the individual classes. The studies on the challenge (III) are based on a UAV dataset, acquired on five different measurement areas in Peru in 2019. Dataset shift is detected with qualitative methods and with unsupervised ML approaches, such as Principal Component Analysis and Autoencoder. Based on the results, a supervised regression of soil moisture is performed on different combinations of measurement areas. Additionally, to study the effects of dataset shift on the regression, the dataset is augmented with Monte Carlo methods. The applied SOM regressor is relatively robust against soil moisture sensor noise and performs well on small datasets, while the applied RF performs best on the full dataset. Dataset shift makes this regression task difficult; some combinations of measurement areas form a significantly better training dataset than others. To conclude, the presented studies tackling the three main challenges show promising results. The developed ML methods can be further enhanced in future research.

Felix M. Riese
Felix M. Riese
MBA Consultant

Consultant at Roche (CH) and MBA Fellow at CDI (FR).