Skip to main content

Segmentation of dust storm areas on Mars images using principal component analysis and neural network


We present a method for automated segmentation of dust storm areas on Mars images observed by an orbiter. We divide them into small patches. Normal basis vectors are obtained from the many small patches by principal component analysis. We train a classifier using coefficients of these basis vectors as feature vectors. All patches in test images are categorized into one of the dust storm, cloud, and surface classes by the classifier. Each pixel may be included in several dust storm patches. The pixel is classified as a dust storm or the other classes based on the number of dust storm patches that include the pixel. We evaluate the segmentation method by the receiver operator characteristic curve and the area under the curve (AUC). AUC for dust storm is 0.947–0.978 if dust storm areas determined by our visual inspection are assumed to be ground truth. Precision, recall, and F-measure for dust storm are 0.88, 0.84, and 0.86, respectively, if we remove false negative pixels efficiently and maintain the size of true positive dust storms using two different threshold values. The tuning parameters of the classifier used in this study are determined so that the accuracy for dust storm is maximized. We can also tune the classifier for cloud segmentation by changing the parameters.


The planet Mars is regularly affected by dust storms that cover contiguous areas ranging in size from local (10,000 km2) to global (> 5 × 107 km2) and present a variety of hazards/challenges to present and future exploration of the planet. Martian dust storms are atmospheric phenomena visualized by dust suspended in the atmosphere. The atmospheric phenomena controlling dust storms are suggested by the storms’ shapes and sizes, as well as the season, region, and local time of initiation. Wang et al. (2011) showed the distribution of curvilinear dust storms in the southern hemisphere observed by MGS/MOC. Guzewich et al. (2015) clearly showed the frequency, regionality, and seasonality of “textured dust storms” seen in MGS/MOC images. Guzewich et al. (2017) revealed the relationship between the frequency of textured dust storms in images from the Mars Color Imager onboard the Mars Reconnaissance Orbiter (MRO/MARCI) and the surface albedo. Their climatological research on curvilinear or textured dust storms suggested their mechanisms statistically for the first time, which had a large impact on dust storm studies. Although they assumed that active convection inside dust storms was reflected in the storms’ textures, they did not clarify what types of atmospheric phenomena are associated with the production of large surface stresses that trigger first dust lifting.

The shape, texture (mesoscale cloud top morphology, Kulowski et al. 2017), and size of dust storms and the climatology of these characteristics are important clues to understanding Martian dust storms. The relations between such characteristics and the phases of the variety of atmospheric waves as well as the frequency of textured/curvilinear dust storms and the spatial and seasonal variations of the cumulative area are also worth investigating (Wang 2007; Wang et al. 2011; Guzewich et al. 2015, 2017). However, it is time-consuming to detect all dust storms visually because of the vast number of images taken by just two instruments, the MGS/MOC and MRO/MARCI. It is also difficult to objectively define which features characterize dust storms. Even approximate categorization of dust events into textured dust storms, untextured yet discrete dust storms, and haze depends on the subjective experience of the human observer. For these reasons, visual detection and categorization of the three types of dust storm textures introduced by Kulowski et al. (2017) would not be exactly reproduced by another human viewing and characterizing the images. Furthermore, we perceive a need to update the criteria for detecting dust events, especially obscured ones such as untextured yet discrete dust storms and haze, via visual detection. Therefore, it would be useful to be able to automatically detect dust events, measure their shape, pattern, and size, and improve the objectivity and reproducibility of dust storm detection. If a dust storm “catalog” summarizing the shape, size, and textures of dust storms was produced objectively, it would be an important database for studies of atmospheric phenomena controlling dust storm initiation. In addition, we may be able to automatically detect and categorize dust storms using images taken by narrow-angle cameras as well as global swath images taken by MGS/MOC and MRO/MARCI.

Maeda et al. (2015) automatically evaluated the existence of dust storms in Mars images using a support vector machine (SVM), which is an algorithm for pattern recognition using supervised learning that has been applied to classification and regression. Their algorithm successfully detected 80% of human-identified dust storms, but they did not try automated segmentation of dust storm regions in their study. Ogohara et al. (2016) also successfully evaluated the existence of water ice clouds (WICs) in the Martian atmosphere using an SVM trained by a few simple statistics of images. Performance of their algorithm was about 80%, comparable with Maeda et al. (2015), but automated segmentation of WICs was beyond their work. In a general application of pattern recognition and computer vision, an image is divided into several small patches and each patch is classified into one or more classes (e.g., Coupé et al. 2011). Turk and Pentland (1991) introduced eigenfaces using principal component analysis (PCA) for face recognition. Jiang et al. (2017) also used the patch-based method and PCA for face recognition. In this study, we apply the combination of the patch-based approach and PCA to Mars images to segment Martian dust storms. This enables us not only to evaluate the existence of dust storms, but also to measure the shape and size of dust storms automatically. Furthermore, we may be able to recognize and categorize textures of dust storms because we can obtain information on which basis images contribute to a patch classified as a dust storm. In the “Methods/Experimental” section, we provide an overview of the observation data that we use in this study and the pre-processing of input image data. We also describe details of our algorithm for automated segmentation of dust storm areas using PCA and machine learning. Results and performance evaluation of the algorithm are shown in the “Results and discussion” section.



We used reflectance data from red and blue bands (575–675 nm and 400–450 nm, respectively) images acquired by MGS/MOC (Malin et al., 2010). In this study, we begin our investigation of methods for segmentation of dust storm areas with only the region centered around 180° E–40° N where many textured dust storms have been observed (Guzewich et al. 2015). This region is suitable for training a classifier described in the following subsection because there are many samples of true dust storms. We examine 800 × 600 pixels (40° × 30°) subsets of the full-size MOC wide-angle camera (MOC-WA) image swathes. The subsets are centered at 180° E–40° N and extracted from MOC-WA images taken in the late northern fall of MY24 and MY26, one of the active seasons of dust storms in this latitude band (Guzewich et al. 2017). Red and blue band images of MOC-WA were obtained through the U.S. Geological Survey (USGS, Noise reduction, radiometric calibration, and coordinate transformation to longitude-latitude coordinates were done using ISIS3 ( Opposition surge and low-frequency patterns of reflectance were removed from the images using methods presented by Wang and Ingersoll (2002) and Ogohara et al. (2016). Coherent longitudinally inhomogeneous patterns shown by Ogohara et al. (2016) were filtered out using the longitudinal running mean of a full 31-pixel width.

Figure 1a shows a typical dust storm red band image in the region of 160° E–200° E and 25° N–55° N obtained after the pre-processing noted above. The resolution is 0.05° pixel−1. Textures at the top of the dust storm are clearly seen. However, surface albedo patterns clearly visible in the bottom of the image might potentially be falsely classified as a dust storm, and therefore, we attenuate the surface albedo patterns by subtracting a regionally defined, dust-free background image from each investigated image of interest. Each pixel value of the background image, Ib(i,j), is expressed as follows:

$$ {I}_b\left(i,j\right)=\underset{1\le n\le 9}{\min }{I}_n\left(i,j\right), $$

where In means the nth image. The nine images used for producing Ib are listed in the (Additional file 1: Figure S1). They were chosen arbitrarily from images where few or no dust storms were seen. After some trials using less than nine images to create a background subtraction image, we determined that we were best able to attenuate the surface patterns relative to dust storms and clouds using at least nine images. Figure 1b shows an example of a dust storm image obtained by subtracting a background image from Fig. 1a. The surface patterns seen in Fig. 1a have been attenuated in Fig. 1b, and the dust storm has been emphasized. Note that we do not necessarily remove the surface patterns completely although the surface patterns might be much attenuated in the subtracted image if more than nine images were used.

Fig. 1
figure 1

Examples of images and ground truth of dust and cloud distributions used in this study. a An example of a pre-processed image in the red band. The albedo patterns on the surface and topography (e.g., craters) are recognized visually in the bottom of the image although they are weaker than the dust storm seen in the center of the image. b The subtracted image produced from a. c The ground truth of the distributions of dust storms and clouds seen in a and b. We prepare the ground truth subjectively based on the red and blue band images

Basis determination

We extract many patches (a sub-sampled region of size N × N pixels within the original image) from six images showing dust storms in the red band observed at different times and those in the blue band observed at the same times as the red band. These twelve images of dust storms (six observations in two bands) were chosen because they were already known to have dust storms in them and were chosen arbitrarily from among those examined that had dust storms. A patch of N = 10, at the resolution of the MOC-WA images, is comparable in size to the smallest dust event summarized by Cantor et al. (2001) in the northern part of the images used for this study. Extreme values of N (e.g., N = 2 or 100) are inappropriate because the method for dust storm segmentation proposed in the following sections implicitly assumes that the patch size is comparable to or smaller than dust storms and larger than the patterns characteristic of dust storms. Therefore, the cases of N = 10, 20, and 30 are investigated in this study. All patches are extracted from the six images allowing the superposition of two adjacent patches, which share (N-2)N pixels. A list of the patches is created by starting at one corner of the image and stepping by two pixels in one direction, until the next corner is reached, then returning to the start, at the same place in the first direction, but then shifting by two pixels in the second direction.

Patches classified as “surface” are the dominant classification in the list because the majority of the pixels seen in these images shows the surface, rather than dust or ice clouds. Imbalance in the number of patches between the three classes may cause a reduction of classification accuracy of the two minor classes. This means that the classifier can distinguish the surface from the other regions but cannot separate dust storm and cloud. In addition, we should use the same number of patches in the three values of N to compare the performances of the proposed method between N = 10, 20, and 30. Therefore, we actually extract 140,000 patches of each class randomly from the list of the patches in each of the values of N after assigning each patch to one of the three classes in the manner described in the “Classifier and ground truth” subsection. This number of patches for each class is limited by the memory constraints of our computer system that are required during processing, especially in the case of N = 30. Letting M be the number of patches generated from the six subtracted images, M is 420,000 (140,000 × 3) in each of the three values of N used.

The ith patch (1 ≤ i ≤ M) can be expressed as an N2 dimensional vector, \( {\boldsymbol{t}}_{\boldsymbol{i}}\in {\mathbb{R}}^{N^2} \), whose elements are identical to the pixel values of the patch. We derive basis vectors of the N2 dimensional space using PCA. Note that the average patch, the mean over all the M patches, \( \overline{\boldsymbol{t}}\in {\mathbb{R}}^{N^2} \), is subtracted from each of the M patches before the PCA is done. The normalized basis vectors derived by PCA are perpendicular to each other, and an arbitrary vector can be uniquely expressed as a linear combination of them. The number of the basis vectors is generally N2 but N2 is so large that it takes a long time to determine the best combination of parameters for classification and to train a classifier using it. Therefore, we choose the K (<N2) basis vectors with the largest eigenvalues, \( {\boldsymbol{e}}_{\mathrm{PCA}j}\in {\mathbb{R}}^{N^2}\ \left(1\le j\le K\right) \), from the N2 basis vectors derived by PCA. K is determined so that the cumulative proportion of variance, i.e., the variability in the pixel reflectances across the M patches, represented by a linear sum of the K basis vectors, becomes larger than 99%. Note that each of the K basis vectors does not necessarily correspond to some kind of variability in reflectance that can be interpreted physically.

Feature vector

\( \overline{\boldsymbol{t}} \) is subtracted from each patch vector. Subtracted patch vectors are defined as follows:

$$ {\boldsymbol{s}}_i={\boldsymbol{t}}_i-\overline{\boldsymbol{t}}\ \left(1\le i\le M\right) $$

si can be expressed uniquely as a linear combination of N2 PCA basis vectors. But, because K < N2, we cannot exactly express it using ePCAj (1 ≤ j ≤ K). Thus, we approximate it by an orthogonal projection, si, on a plane mapped by ePCAj:

$$ {\boldsymbol{s}}_i^{\prime }=U{\boldsymbol{c}}_i, $$
$$ {\boldsymbol{c}}_i={U}^{\mathrm{T}}{\boldsymbol{s}}_i, $$

where U is an orthogonal projection matrix, U = (ePCA1, ePCA2, , ePCAK). ci = (ci1, ci2, , ciK)TK is a coefficient vector and means the position of si in the sub-space mapped by U.

We use reflectance images in the two bands, red and blue, as described in the previous section. We extract K basis vectors using PCA from M patches of each band. K depends on the band. A feature vector used for the classification is a combination of the two feature vectors derived from one red patch and one blue patch. Letting \( \overline{t_{\mathrm{R}i}} \) and \( \overline{t_{\mathrm{B}i}} \) be the averages of the ith patch (the mean value over all elements of ti) in the red and blue bands, respectively, the feature vector of the ith patch, fi, is defined as follows:

$$ {\boldsymbol{f}}_i=\left(\begin{array}{c}{\boldsymbol{c}}_{\mathrm{R}i}\\ {}{\boldsymbol{c}}_{\mathrm{B}i}\\ {}\overline{t_{\mathrm{R}i}}\\ {}\overline{t_{\mathrm{B}i}}\end{array}\right) $$

where \( {\boldsymbol{c}}_{Ri}={\left({c}_{Ri1},{c}_{Ri2},\cdots, {c}_{Ri{K}_R}\right)}^T \) and \( {\boldsymbol{c}}_{Bi}={\left({c}_{Bi1},{c}_{Bi2},\cdots, {c}_{Bi{K}_B}\right)}^T \) are coefficient vectors of the ith patch in the red and blue bands, respectively. KR and KB are the numbers of the basis vectors that contribute most to encompassing the image variability in their respective bands as defined by the PCA analysis.

Classifier and ground truth

We adopt a multilayer perceptron (MLP) classifier, which is included in scikit-learn0.18.1 (Python 3.5.1). MLP is a feedforward artificial neural network (NN) that imitates biological neural networks. NNs have been used for several fields in planetary science (e.g., classification of asteroid spectra, Howell et al. 1994) and engineering (e.g., identification of environment models around space robots, Venkataraman et al. 1993). An MLP at least consists of three layers of neurons, the input layer, output layer, and one or more hidden layers between them. The numbers of hidden layers and neurons in each layer are typical tuning parameters controlling the complexity and ability of the network. There is only one hidden layer in this study. We provide an overview of the MLP in the “Appendix” section for readers who have no experience in machine learning. Please see the “Appendix” section for the meanings of technical terms used later in this section. The number of neurons in the hidden layer Nneuron = {10, 20, 30, 70, 100}, the learning rate (const.) γ = {10−4, 10−3, 10−2, 10−1}, and the maximum number of iterations Tmax = {100, 300, 500, 700, 1000, 5000} are determined by grid search for each patch size so that accuracy for the dust storm class is maximized. The activation function for the hidden layer is the so-called “ReLU” function, f(x) = max(0, x), for all patch sizes. The solver for weight optimization is a stochastic gradient-based optimizer, Adam, proposed by Kingma and Ba (2015). The parameters for the optimization are listed in Table 1. We did not arbitrarily choose the ReLU function, the Adam optimizer, and the parameters shown in Table 1, but adopted them in this study because they are popular techniques with popular parameter choices and produced adequate results, as discussed in the “Results and discussion” section.

Table 1 Parameters for the MLP classifier included in scikit-learn0.18.1. These are the same as commonly used values. See Kingma and Ba (2015) for the meanings of the parameters for the Adam optimizer

The ground truth of the segmentation was prepared by the authors in advance of training the classifier. We first assign each pixel to a dust storm, cloud, or surface class based on our subjective experience. Thus, some patches may contain two or more classes representing different labeled regions of pixels with them. Next, if both the dust storm and cloud areas inside each patch are smaller than 20% of the patch area (N2 pixels) (Ishii et al., 2016), the patch is labeled “0” indicating the surface class. If not, the patch is assigned the label of the larger of the two other classes (“1” for dust storm and “2” for cloud). The MLP is trained using the feature vectors of the patches (fi in Eq. (5) and Eqs. (6) and (10) in the “Appendix” section) with the true labels (gi in Eq. (10) in the “Appendix” section). The six subtracted images and corresponding ground truth images of the segmentation that we used for training are shown in the (Additional file 2: Figure S2). Dust, cloud, and surface patches used for training are the same as those from which the basis patch vectors are determined and the number of patches of each class is 140,000 in the cases of N = 10, 20, and 30. We could use more surface patches because the surface area is much larger than areas of dust storm and cloud. However, we chose 140,000 patches from all surface patches randomly to avoid an imbalance in the number of patches between the three labels as described in the “Basis determination” subsection.

Observations that can physically decide whether features seen in images are dust storm, clouds, or something else already exist. The Thermal Emission Spectrometer onboard MGS (MGS/TES) can measure optical thickness of dust and water ice cloud separately. Such types of observation data with physical information on dust and cloud seem to be suitable for the ground truth data instead of visual inspection by the authors. However, the footprint of TES is extremely narrow compared to the field of view of MOC and thus rarely crosses local dust storms. TES observations are not enough to train the classifier. Therefore, we do not prepare the ground truth images based on TES observations but based on visual inspection by the authors to maintain the number of patches available for training.

Probability image

What is classified into the dust storm class or the others is not each image but each patch. If patches into which an image is divided as shown in the “Basis determination” subsection are targets of classification, the shape of a dust storm cannot be sufficiently resolved. Then, we shift a target patch, which is classified into one of the three classes in a test image, pixel by pixel, longitudinally, and latitudinally. An arbitrary pixel is basically included in N2 target patches. Letting nD(i, j) be the number of patches which include a current pixel (i, j) (i, j) and are categorized as a dust storm, we hereafter call nD(i, j)/N2 the “dust storm probability” of the pixel (i, j). An image whose pixel values are a dust storm probability between 0 and 1 is obtained, which is called a “probability image” in this study. The shape and size of a dust storm can be measured accurately by regarding pixels whose dust storm probability is larger than a threshold value as the dust storm. Letting nC(i, j) be the number of patches which include a current pixel (i, j) and are categorized as a cloud, we can also define the “cloud probability” of the pixel (i, j), nC(i, j)/N2.

Results and discussion

The classifier, in practice, must be able to separate surface, dust storm, and cloud areas in images different from those used in its training. In the following subsections, we test five chosen subtracted images (Additional file 3: Figure S3) and refer to them as the “test images” hereafter. Textured dust storms seen in four of the five test images were most distinctive (note that one of the five test images shows no dust storm), and therefore, we could prepare reliable ground truth images.

Probability image

Figure 1b already mentioned in the “Data” subsection is one of the subtracted images used for evaluation of the method developed in this study and is excluded in the phase of training the classifier. Figure 1c indicates the true distributions of a dust storm and clouds included in Fig. 1b, which were prepared based on the authors’ visual inspection. Figure 2a, b, and c are probability images of dust for N = 10, 20, and 30, respectively, derived from Fig. 1b. Figure 2d, e, and f are those of WIC. The shape of the dust storm is resolved more precisely in the case of N = 10 (Fig. 2a). But, the wavy features in the northwest area of the image should not be recognized as dust storms, because they seem to be typical WICs associated with topographically generated gravity waves, the so-called lee waves. In the cases of N = 20 and 30 (Fig. 2b, c), a high probability of dust storm associated with WICs still remains in the northwest of the probability images. However, dust storm probability near the eastern and western edges of the images is smaller than that in the case of N = 10. Nevertheless, the size of the dust storm in the case of N = 20 (Fig. 2b) is comparable with that in the case of N = 10 shown in Fig. 2a. Cloud probability is high in the northern part of the images in Fig. 2d, e, and f and seems to vary inversely proportional to N. Cloud probability in the wavy area does not decrease largely with the size of patches. However, the area of high probability around the wavy WICs itself is clearly smaller than that of the true wavy WIC area shown in Fig. 1c. The wavy patterns of WICs seem to be confused with dust storms, especially in the case of N = 10.

Fig. 2
figure 2

Examples of probability images in the cases of N = 10, 20, and 30 derived from Fig. 1b. a, b, and c are probability images for dust storm in the three cases, respectively. d, e, and f are those for WIC

Receiver operator characteristic curves

In this subsection, we evaluate how accurately dust storm areas are determined in the test images that were not used for training. Note that we here have to divide an image into two classes (dust storm or not dust storm) and do not need to distinguish between cloud and surface areas. Therefore, we binarize probability images of dust by thresholding of the images, that is, we produce an image of pixel values of either 0 or 1 (black or white) by determining whether the dust probability in that pixel is below or above some chosen threshold value. However, the binary images highly depend on the threshold value used to binarize the probability images. The best threshold values may be different between N = 10, 20, and 30. Therefore, we cannot compare the results of N = 10, 20, and 30 derived using a single threshold value straightforwardly. Thus, evaluation methods independent of threshold values as well as those that depend on threshold values have been proposed in the discipline of pattern recognition. Those partinent to this study are described briefly in the “Appendix” section. Figure 3 shows one method, receiver operator characteristic (ROC) curves calculated by sweeping the threshold values in probability (Fawcett 2006), indicating the relationship of the true and false positive rates. ROC curves have been used in the fields of medical imaging and are independent of the ratio of the true positive area (dust storm) and the true negative area (the others). Area under the curve (AUC) is used for evaluating the performance of the algorithm independently of the threshold value (see Appendix and Fawcett 2006 for details). The maximum and best values in AUC are 1. AUC in the case of N = 30 is the highest among the three cases for dust storm as shown in Fig. 3. The case with N = 30 is the best in that true positive pixels can drastically increase only by tolerating just a small number of false positive pixels. However, 30 pixels (1.5°) in latitude roughly correspond to 90 km. The smallest dust storms among local dust storms summarized by Cantor et al. (2001) were about 17 × 17 km2. They may be attenuated in dust probability images in the case of N = 30 due to their small size compared to the patch size. Figure 4 shows the dust probability images in the cases of N = 20 and 30 produced from a subtracted image in which a small local dust storm can be seen. The small dust storm has a horizontal scale of 100 km. Dust probability around it is obscured and clearly lower in the case of N = 30 than in the case of N = 20. This does not only lead to underestimation of the area of the small dust storm but also makes it difficult to detect the existence of the small dust storm. Nevertheless, the difference in AUC between N = 20 and 30 is just 0.003. Thus, we focus on results of N = 20 hereafter rather than N = 30 to suppress false negatives of dust storms.

Fig. 3
figure 3

ROC curves and AUCs for the dust storm class. The blue, red, and green lines correspond to the patch sizes of 10 × 10, 20 × 20, and 30 × 30 pixels, respectively

Fig. 4
figure 4

Probability images for dust in the cases of a N = 20 and b 30. Although a small dust storm shown by the white arrows can be seen near 177° E–38° N, it is obscured in b

Binarized image

For segmentation of dust storm regions, we need to binarize the probability images using a threshold value as indicated in the above subsection and evaluate the generalization ability of the method using evaluation indices such as precision, recall, and F-measure (e.g., Fawcett 2006). A threshold value close to 1 drastically reduces positive pixels in binary images in general. Precision gets close to 1 but recall is extremely reduced. As the threshold value decreases, the positive areas grow and other tiny positive areas appear in the binary images. As a result, recall becomes greater than precision. Although priorities of precision and recall are up to individual research interest, improving precision (i.e., reducing false positive pixels) without reducing recall (i.e., maintaining true positive pixels) is generally encouraged. Thus, we binarize the probability images using two threshold values in this study. At first, we generate a binary image, BL, from a probability image using the larger threshold, pL, as shown by the white region in Fig. 5a. Positive areas seen in BL tend to be small and then recall is low. However, the positive areas in BL can be regarded as being in dust storms with high reliability. At the next step, we generate another binary image, BS, from the same probability image using the smaller threshold, pS, as shown by the gray regions in Fig. 5a. The positive areas seen in BS tend to be large, and then, recall is high although precision is low. This means that the positive areas in BS include the surface or WIC regions as well as dust storm regions. It should be noted that the all positive regions in BL (white in Fig. 5a) are inside the positive regions in BS (gray in Fig. 5a). However, not all the positive regions in BS include positive regions in BL. Then, we regard the positive regions in BS which do not include positive regions in BL as false positives and remove them. We can detect the true dust storm regions with high recall, suppressing the increase of false positive pixels due to decreasing the threshold value. Figure 5b shows a trinarized image generated from the probability images of dust and WIC shown in Fig. 2b and e using (pL, pS) = (0.95, 0.5). The size of the dust storm is maintained, and the tiny false positive regions seen in Fig. 5a are removed. We binarize the probability images used for the ROC curves in the “receiver operator characteristics curves” subsection and calculate precision, recall, and F-measure from the binarized images and the ground truth images. Table 2 shows the evaluation indices directly calculated from all of the pixels of the five binary images derived from the test images via the probability images. Note that each evaluation index is not an average over the five index values calculated from the five binary images. For both dust storm and cloud, recall in the case of the two threshold values (0.84) is comparable with or higher than those in the cases of the single threshold value. On the other hand, precision (0.88) is between the two cases of the single threshold value. Therefore, F-measure improves in the two-threshold values cases.

Fig. 5
figure 5

Results of the segmentation using the proposed method. a A trinarized image of dust derived from Fig. 2b using a single threshold value. The white region is the positive region when using the larger threshold value, pL. The gray regions including the white one are positive regions when using the smaller threshold value, pS. b A trinarized image of dust (white) and WIC (gray) derived from Fig. 2b and e using the combination of two threshold values. pL and pS are 0.95 and 0.5, respectively

Table 2 Precision, recall, and F-measure for dust storm calculated from all of the five test images. The two threshold values, pL and pS, are 0.95 and 0.5, respectively

(pL, pS) = (0.95, 0.5) used for binarization is just an example and actually depends on the individual motivation of the segmentation of dust storms. If one is investigating the size distributions of dust storms including small dust storms, one has to decrease pL. If the shape of relatively large dust storms is a major target, both pL and pS should be large. The sensitivity of these results to location and season is discussed in the next section.

Generally speaking, Martian dust storms are bright in the red band and as dark as the surface in the blue band. On the other hand, Martian clouds are bright in the red and blue bands and especially much brighter than the surface in the blue band. Cantor et al. (2001) visually detected dust storms based on the difference in the surface albedo from adjacent areas in both the bands. As they did, we have prepared the ground truth of the segmentation based on differences in appearance between dust storms, clouds, and the surface due to such differences in their optical properties. The ground truth data for optically thick phenomena with textures different from the surrounding surface are reliable (e.g., Kulowski et al. 2017). However, those for optically thin dust haze and clouds are not always as reliable as those for optically thick dust events. We have experienced difficulty in visually deciding whether the pattern seen in the northern part of the target region is a dust haze or cloud. Therefore, what have been reported in this study are conservatively an algorithm for the segmentation of optically thick dust events and clouds, and its performance.

Nonetheless, our algorithm for the automated segmentation achieves high AUC if we evaluate the algorithm based on the subjective ground truth. The results of the segmentation shown in this paper can depend on who prepares the ground truth used for training the classifier and evaluating the algorithm. For example, someone may regard the area around 187° E–38° N as a part of a dust storm. Figure 2a and b show that the area around 187° E–38° N is partially recognized as a dust storm, although it is not included in the true dust storm area shown in Fig. 1c. This means that the classifier is not so sensitive to such minor variations in the ground truth data depending on the inspectors’ experience. Therefore, we expect that we could achieve physics-based segmentation with as high AUC as that presented in this study if objective, adequate ground truth data based on other observations (e.g., radars and spectrometers) became available.

The segmentation results presented in the current work are not sensitive to possible variations in the ground truth images depending on the inspectors, but it is also true that the segmentation results are not necessarily independent of the authors’ experience because the ground truth images used for training the classifier and evaluating its performance are prepared subjectively. The proposed method, however, enables fast detection and segmentation of dust storm areas in Mars images with less work based on rigid (but subjective) detection criteria. This helps us intercompare the results of segmentation or detection of dust storms done by several researchers and enables us to reproduce their results. Therefore, the proposed method still makes a large contribution to the improvement of objectivity and reproducibility of dust storm detection.

Precision and recall in the case of the two threshold values are 0.88 and 0.84 if the two threshold values are 0.95 and 0.5. Twelve percent of dust storm areas detected by our method are not dust storm and 16% of true dust storm areas are not detectable. However, the overestimation in the area of dust storms is just 4% as a result of our tuning the two threshold values so that precision and recall become comparable (i.e., F-measure is as high as possible). Our method for dust storm segmentation has a high reliability on measuring area of dust storms. Even if one is intent to measure the shape and size of each dust storm, the method has high enough accuracy because the size distribution of dust storms is not modified over several orders by the error in the area of ~ 15%.

We have tuned the parameters for an MLP so that the performance of dust storm segmentation reaches the peak. Therefore, Figs. 2 and 5 imply that AUC and F-measure for ice clouds are both lower than those for dust storms. But, this method is not necessarily unsuitable for the segmentation of ice clouds. The evaluation indices for clouds can improve if we adopt parameters that maximize the performance of cloud segmentation. This method is also useful for segmentation of clouds.

This study has focused on the region with a high frequency of dust storms revealed by Guzewich et al. (2015, 2017) and Kulowski et al. (2017). The tuning parameters of the MLP mentioned above may be valid in some regions but may be invalid in other regions because patterns of dust storm and haze may depend on locations (i.e., latitude, altitude, and slope angle). They may be also invalid for other images taken at another time if patterns of dust storms vary with season and Mars year, because training and test images we have used are much less in number than images of the region taken during the lifetime of MOC. The region we have focused on is favorable for “plume-like” dust storms defined by Kulowski et al. (2017). There still remains the possibility that the classifier in this study is not applicable to the segmentation of the other two types of dust storms (“pebbled” and “puffy” dust storms). However, we can do the segmentation with a performance comparable with the presented method if the classifier is trained for every type of dust storm depending on the region, season, and local time. Textured dust storms as shown in Fig. 1b should be successfully separated at least.


We have investigated an algorithm for automated segmentation of dust storms in Mars images observed by MGS/MOC. We have divided an image into small patches and have expressed them as linear combinations of the basis patch images extracted by PCA. Coefficients of the basis images have been used for training the classifier, a neural network. All patches included in the test images are categorized into one of the three classes, dust storm, cloud, and something else (usually the ground surface) by the classifier. Pixels will appear in multiple patches, for which some patches containing that pixel will be categorized as a dust storm while some other patches containing that pixel will be categorized as a different class. Using the ratio of patches that classify a pixel as a dust storm to the total number of patches that include that pixel, we evaluate whether the pixel is a part of a dust storm or not to sufficiently resolve the shape and size of dust storms. The ratio of patches in which a pixel is classified as a dust storm to all patches that include the pixel is called the dust storm probability (of that pixel) in this study. Area under the ROC curves can be calculated from the dust storm probability images by sweeping the threshold value for binarization. AUC for the patch sizes N = 10, 20, and 30 are 0.947, 0.975, and 0.978, respectively. AUC is the highest in the case of N = 30 but relatively small dust storms cannot be detected in that case due to the large patch size. The AUC of the algorithm presented in this study cannot be compared with that of other algorithms, because the AUC of algorithms for dust storm segmentation has never been reported by previous research so far. However, the AUC of our algorithm is comparable with those of algorithms reported in other research fields (e.g., medical imaging, Hatanaka et al. 2018). Our algorithm has as high performance as other segmentation algorithms in the fields of general image processing.

We can automatically obtain a catalog summarizing the shape, size, and texture of the variety of dust storms if we apply this algorithm to the vast number of Mars images taken by orbiters for about 20 years. The existence of such a comprehensive catalog would be an important resource for understanding the climatology and processes of Martian dust storms.



Area under the curve


Integrated Software for Imagers and Spectrometers


Mars Color Imager


Mars Global Surveyor


Multilayer perceptron


Mars Orbiter Camera


Mars Reconnaissance Orbiter


Neural network


Principal component analysis


Receiver operator characteristics


Stochastic gradient descent


Support vector machine


Thermal Emission Spectrometer


U.S. Geological Survey


Wide angle


Water ice cloud


Download references


We thank Dr. Wataru Sunayama and Dr. Yuji Hatanaka for useful comments on this work. This research has made use of the USGS Integrated Software for Imagers and Spectrometers (ISIS3). This study was partially supported by the Sumitomo Foundation.


This study was partially supported by the Sumitomo Foundation.

Availability of data and materials

The dataset supporting the conclusions of this article is available in the USGS repository (

Author information

Authors and Affiliations



RG prepared the ground truth data, trained the classifier, and evaluated the performance of the algorithm using test images. RG also wrote all source codes used for this study and generated all figures in this article. KO downloaded all images used for this study and converted them to longitude-latitude images of reflectance and illumination angle. KO also proposed the topic and the procedure of the pre-processing and wrote the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kazunori Ogohara.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

Figure S1. Images used for preparing the background image. (PDF 2866 kb)

Additional file 2:

Figure S2. Subtracted images in the red band and ground truth data of dust storms and clouds used for training the classifier. Each row shows a subtracted image (1st column), the dust storm area (2nd column), and the cloud area (3rd column). The ground truth data are based on the authors’ visual inspection. (PDF 2600 kb)

Additional file 3:

Figure S3. Subtracted images and ground truth data of dust storms and clouds used for the evaluation of the algorithm. Each row shows a subtracted image (1st column), the dust storm area (2nd column), and the cloud area (3rd column). The ground truth data are based on the authors’ visual inspection. (PDF 2485 kb)



Multilayer perceptron

Readers can easily find tutorials on MLPs or NNs in standard textbooks (e.g., Bishop 2006) and on the internet. Here, we briefly overview the concept, structure, and parameters of the MLP used in this study for readers who have no experience in machine learning based on Bishop (2006). Figure 6 shows a schematic view of the MLP used in this study, which consists of an input, output, and one hidden layer. In this study, the input layer is a feature vector of each image patch expressed by fi shown in Eq. (5), the output layer is the series of possible classifications for a patch (i.e., dust storm, ice cloud, or surface), and the hidden layer is the mathematical process transforming the input to the output. The numbers of neurons in the input layer represent the number of features (the dimension of fi), K′ = KR + KB + 2, and the number of neurons in the output layer represents the number of classes, 3. The number of neurons in the hidden layer, Nneuron, is a tuning parameter and roughly controls the number of adjustments made to the inputs to accurately transform them into one of the output classes (neurons). This network, shown in Fig. 6, is expressed as the following function:

$$ {y}_{ih}\left({\boldsymbol{f}}_{\boldsymbol{i}},{\mathbf{W}}^{(1)},{\mathbf{W}}^{\left(\mathbf{2}\right)}\right)=\sigma \left(\sum \limits_n^{N_{\mathrm{neuron}}}{z}_n{W}_{hn}^{(2)}\right) $$
$$ {z}_n=\phi \left(\sum \limits_l^{K^{\prime }}{f}_{il}{W}_{nl}^{(1)}\right). $$

fil indicates the lth components of the ith feature vector defined by Eq. (5). Wnl(1) is a weighting factor between the lth neuron of the input layer and the nth neuron of the hidden layer, and Whn(2) is a weight between the nth neuron of the hidden layer and the hth neuron of the output layer. ϕ(∙) :  →  and σ(∙) :  →  are arbitrary nonlinear activation functions (i.e., mathematical transformations) of the hidden and output layers, respectively, that enable the classifier to be applicable to nonlinear classification problems. For multiclass classification problems so far, the softmax function expressed by the following equation has been the conventional expression used for σ:

$$ \upsigma \left({a}_h\right)=\frac{e^{a_h}}{\sum \limits_j^3{e}^{a_j}}, $$

where ah =\( \sum \limits_n^{N_{\mathrm{neuron}}}{z}_n{W}_{hn}^{(2)} \).

Fig. 6
figure 6

A schematic view of the structure of the MLP used in this study

The ground truth vector of the ith patch, gi is as follows:

$$ {\boldsymbol{g}}_i=\left\{\begin{array}{c}{\left(1\ 0\ 0\right)}^T\ \left(\mathrm{The}\ \mathrm{true}\ {\mathrm{label}\ \mathrm{is}}^{"}{0}^{"}\right)\\ {}{\left(0\ 1\ 0\right)}^T\ \left(\mathrm{The}\ \mathrm{true}\ {\mathrm{label}\ \mathrm{is}}^{"}{1}^{"}\right)\\ {}{\left(0\ 0\ 1\right)}^T\ \left(\mathrm{The}\ \mathrm{true}\ {\mathrm{label}\ \mathrm{is}}^{"}{2}^{"}\right)\end{array}\in {\mathbb{R}}^3\ \right. $$

Letting yi be (yi1, yi2, yi3)T, the error function of this network is expressed as follows:

$$ E\left({\mathbf{W}}^{(1)},{\mathbf{W}}^{\left(\mathbf{2}\right)}\right)=\frac{1}{2}\sum \limits_i^M{\left\Vert {\boldsymbol{y}}_i\left({\boldsymbol{f}}_i,{\mathbf{W}}^{(1)},{\mathbf{W}}^{\left(\mathbf{2}\right)}\right)-{\boldsymbol{g}}_i\right\Vert}^{\mathbf{2}} $$

Training this classifier means solving the optimization problem of the weights by minimizing E (i.e., searching for optimal weights so that E is minimized). Once the optimal weights are determined, we can predict the classes of test patches not used for training by replacing fi, with the test patch vectors.

Optimization of Eq. (10) is usually a multidimensional problem and has multiple local minimums. Therefore, several algorithms for getting the global solution within a realistic time, called “optimizers” or “solvers,” have been reported. An optimizer called stochastic gradient descent (SGD) has been widely used, and several optimizers that diverged slightly from SGD have been developed recently. Although the details of each optimizer are beyond the scope of this paper, it should be noted that most optimizers update the weights iteratively based on their unique updating rules at the learning rate. Large learning rates increase the risk of oscillation of E, and small learning rates increase the risk that E will not converge within realistic iteration times. Increasing K′, Nneuron and the number of hidden layers also make the optimization problem difficult due to the increased number of local minimums although it enables segmentation of Mars images into several classes based on slight differences in local patterns by extremely fine parameter tuning.

Evaluation indices

Here, we briefly introduce the five indices frequently used for evaluating the performance of a classifier for the understanding of readers (see Fawcett 2006 for details). The confusion matrix shown in Fig. 7 defines true positive, true negative, false positive, and false negative. For example, pixels recognized as dust storm after the binarization but that are outside the true dust storm areas defined by a human operator are called false positive pixels. The numbers of true positive, true negative, false positive, and false negative pixels are referred to as TP, TN, FP, and FN, respectively. In addition, the true positive rate, false positive rate, precision, and recall are defined as follows:

$$ \mathrm{True}\ \mathrm{positive}\ \mathrm{rate}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}, $$
$$ \mathrm{False}\ \mathrm{positive}\ \mathrm{rate}=\frac{\mathrm{FP}}{\mathrm{FP}+\mathrm{TN}}, $$
$$ \mathrm{Precision}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}}, $$
$$ \mathrm{Recall}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}. $$
Fig. 7
figure 7

The confusion matrix defining true positive, true negative, false positive, and false negative. True class means the ground truth based on the human operator’s inspection. Predicted class means a label based on the classifier trained in this study

A harmonic mean of precision and recall is referred to as an F-measure. It should be noted that improving either precision or recall results in a decrease in the other. The easiest way to improve recall is to develop an algorithm that regards the entire image as dust storm and minimizes precision. It clearly makes no sense to use such an algorithm. Therefore, the F-measure, which is kept low if either of precision or recall is significantly low, has been another index to measure the performance of the classifier. The F-measure, however, still depends on a threshold value for binarization and we have had a difficulty using the F-measure to compare different classifiers trained using different parameters and features. Here, the trade-off relationship between FP and FN should be noted. Considering that an increase in TP is equivalent to decrease in FN, TP can be increased by decreasing the threshold value and increasing FP. In Eqs. (11) and (12), TP + FN and FP + TN are constants indicating the numbers of pixels assigned the labels “dust storm” and any others by a human operator, respectively. Therefore, an ROC curve shows the relation between true positive rate and false positive rate equivalent to that between TP and FP when continuously changing the threshold value for binarization. Assuming that a drastic improvement in the true positive rate that incurs a slight increase in the false positive rate is desirable (the left part of the ROC curves shown in Fig. 3), AUC is a reasonable evaluation index independent of the threshold value.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gichu, R., Ogohara, K. Segmentation of dust storm areas on Mars images using principal component analysis and neural network. Prog Earth Planet Sci 6, 19 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: