Global polygons for terrain classification divided into uniform slopes and basins

Global terrain classification data have been used for various issues related to topography such as the estimation of soil types and of ground vulnerability to earthquakes and the creation of seismic hazard maps. However, due to the resolution of digital elevation models (DEMs), the terrain classification data from previous studies could not discriminate small landforms such as plains at the bottom of narrow valleys and small rises in plains. Owing to the greater regional variation of small landforms, there is trade-off between DEMs of higher resolution and the creation of global geomorphological legends. To address this problem, we first merged regions with similar topographic characteristics using slope gradients and HAND (height above the nearest drainage) calculated by the 90-m-spatial-resolution DEMs interpolated from the multi-error-removed improved-terrain DEM (MERIT DEM), and united the polygons with the unit catchments of the MERIT-Basins dataset, so that the polygons contain calculated terrain measurements (slope gradient, HAND, surface texture, local convexity, sinks) and noise types as attributes, as well as the ID number of the unit catchment. In addition, we performed k-means clustering on the dataset using slope gradient, HAND, and surface texture, which can be combined with the dataset as a simple terrain classification. The clustering results were prepared in 15 and 40 global uniform clusters and 15 and 40 clusters for each basin to understand the global appearance of the terrain and provide zoning data for regional problem-solving. The 15 clusters were prepared to observe the outline of the terrain without any processing, whereas the 40 clusters were prepared to group and reclassify the polygons to create zoning data for each region. This dataset showed improvements in terms of capturing the small rises in plains compared to the authors' previous global terrain classification data. This dataset can be used as a proxy and is expected to contribute to modeling and estimation in various applications that are known to be related to topography. The datasets of this article are available at [https://gisstar.gsi.go.jp/terrain2021/].


Introduction
Terrestrial land comprises landforms of various ages and constituents that have undergone various slope processes such as fluvial, aeolian, glacial, and coastal, balanced by uplift and sea-level change (Summerfield 1991). Individual landforms involve many implications and vary with scale. Speight (1984) stated that units of 75-1000 m may be considered as an appropriate size for mapping units to represent landform patterns, whereas mapping units of 15-75 m may be considered appropriate for the representation of landform elements.
Terrain classification data for landform patterns have been produced in many countries as geomorphological maps or geospatial information on topography and have been used in disaster management and regional development planning (Carrara et al. 2003;Del Monte 2016). Traditional geomorphological maps are created manually using aerial photographs and other methods to classify landforms and by considering differences in the

Progress in Earth and
Planetary Science *Correspondence: iwahashi-j96pz@mlit.go.jp Iwahashi and Yamazaki Progress in Earth and Planetary Science (2022) 9:33 causes of formation, constituent materials, and time of formation through field surveys and data. However, such labor-intensive work is difficult to perform on wide areas. From around the 1990s, Dikau et al. (1991) and others proposed a method to mechanically classify terrain on a computer by using land surface features (hereinafter referred to as "terrain measurements") calculated from elevation data at grid points, known as a digital elevation model (DEM). In practice, terrain classification data using DEMs have been used in areas such as disaster prevention, environmental assessment, and resource development to assess hazard vulnerability (Mihu-Pintilie and Nicu 2019), soil (Hengl et al. 2017), and seismic susceptibility (Kwork et al. 2018;Karimzadeh et al. 2019;Mori et al. 2020), and there is a strong need for data at the scale of landform patterns.
All of the global terrain classification maps produced since 2000 have been based on DEMs, and are based on the ideas of geomorphometry (Pike 1988;Hengl and Reuter 2008;Florinsky 2017). Furthermore, they are created by combining terrain measurements calculated from DEMs. Specifically, Meybeck et al. (2001), Iwashi and Pike (2007), Drăguţ and Eisank (2012), Sayre et al. (2018), and Iwahashi et al. (2018). Meybeck et al. (2001) and Iwahashi and Pike (2007) represented large landforms using 1-km grid DEMs. Sayre et al. (2018) provided a terrain classification map product for mountainous areas. Drăguţ and Eisank (2012) and Iwahashi et al. (2018) performed object-based classifications. GeoMorphons (Jasiewicz and Stepinski 2013), which is suitable for classifying the shape of mountain slopes, and TerraEX (Netzel et al. 2016), a tool that automatically searches for similar terrain and generates a similarity map, have also been developed to meet global needs.
A description of the research activities of the corresponding author's group is presented as follows. It was discovered that terrain classification data similar to manmade geomorphological maps of Japan could be created by combining terrain measurements calculated from DEMs (Iwahashi 1994;Iwahashi and Kamiya 1995). At that time, it was noticed that the combination of slope gradient and a valley density parameter (later organized as the surface texture; Iwahashi and Pike 2007) could be used to classify broad terrain types such as mountains, volcanic areas, hills, plateaus, and plains, and that the local convexity could be used to classify intermediate terrain types such as terraces and fans. From this context, two types of global terrain classification maps, cellbased and polygonal, were produced from 1-km spatial resolution Shuttle Radar Topography Mission 30 arcsecond DEM (SRTM30) (Iwahashi and Pike 2007) and 280-m MERIT DEM (Iwahashi et al. 2018). The terrain classification data given, or the creation method have been used or recommended as materials to estimate soil types (European Soil Data Centre, Joint Research Centre, European Commission 2008), to estimate Vs30 (average shear wave velocity for the top 30 m; a proxy of ground vulnerability to earthquakes) (Yong et al. 2012;Mori et al. 2020, etc.), and for seismic hazard mapping (Irsyam et al. 2017). However, due to the resolution of the DEMs used, the terrain classification data from these previous studies could not discriminate the bottom plains of narrow valleys or small rises within the plains. The use of higher-resolution DEMs was necessary to improve on this performance, but it was thought that automatic classification would be more difficult with higher-resolution DEMs because artificial steep cliffs in heavily man-made altered plains or unevenness of accuracy would become more apparent. To accommodate the increasing resolution of DEMs and to produce more practical data, Iwahashi et al. (2021) introduced HAND (Height Above the Nearest Drainage; Rennó et al. 2008;Nobre et al. 2011) as an additional parameter for the extraction of small rises in plains, as well as methods of topographic measurement in which DEM unevenness or noise is not amplified. Iwahashi et al. (2021) conducted a terrain classification for the whole of Japan using a 30-m DEM, aiming to achieve a terrain classification that reflected the vulnerability of the ground for various slopes from alluvial plains to mountainous areas, and that had both geomorphological and geotechnical classifications without major contradictions. However, there were still some issues left, including the issue of segmentation which showed a few unexpected connections of polygons of uniform slopes. In extending this achievement to the entire globe, a trade-off arises between higher-resolution DEMs and the creation of a global geomorphological legend. Although unsupervised terrain classification would be possible at any resolution, small landforms involve much more regional variation than large landforms in fitting a geomorphological legend to the classified data.
In recent years, geospatial dataset projects on geography and geology have been progressing in various fields (globally, for example, OneGeology administration 2020; Poggio et al. 2021). The creation of global datasets using satellite imagery as a resource is also underway, for example, the National Aeronautics and Space Administration (NASA)'s Socioeconomic Data and Applications Center (SEDAC) and NASA Earth Observations (NEO) have produced numerous global datasets on economical and natural geography, and a number of global datasets have been released. In the past few years, the rapid improvement in the specifications of GPUs has enabled general-purpose PCs to handle big data, which has led to rapid progress in the field of machine learning. Hence, Page 3 of 24 Iwahashi and Yamazaki Progress in Earth and Planetary Science (2022) 9:33 geospatial information is now required to be used as zoning data for machine learning, rather than simply being viewed directly like printed maps. One of the goals of geomorphometry is to use terrain measurements calculated from DEMs, i.e., physical quantities such as slope gradient, to model and estimate various issues that are known to be related to topography, such as landslide risk, the susceptibility of the ground to shaking, and soil types. Quantitatively zoned terrain data are useful for interpolating and mapping various measurements known to be correlated with topography. The approach of processing terrain classification using DEMs as one of the machine learning parameters for proxies of measured data of environmental analyses has been a theme of recent research (e.g., Keuper et al. 2020;Yamashita et al. 2022), and such needs must be addressed. On the other hand, maps that can represent an overview of the regional landforms are still needed for regions on which little geological and geomorphological information is available.
In this study, we produced global data to perform terrain classification based on the method of Iwahashi et al. (2021), which is better at classifying plains than those of previous studies (Iwahashi and Pike 2007;Iwahashi et al. 2018) and withstands the use of finer DEMs. The method was adjusted to fit the 90-m MERIT DEM.

Construction and content
This chapter describes the materials and methods used in the present work. The entire data construction process is shown as a flowchart in Fig. 1. The flowchart also shows where to find the explanation of each product in the subsections below.

Source data 2.1.1 DEM
The 3-arc-second grid MERIT DEM v1.0.3 (based on Yamazaki et al. 2017) was used as the main material. MERIT DEM was developed by merging several baseline DEMs, mainly the Shuttle Radar Topography Mission 3 arc-second DEM (SRTM3) ver. 2.1 provided by NASA and the ALOS World 3D (AW3D) 30-m DEM ver. 1 provided by the Japan Aerospace Exploration Agency (JAXA).

Region partitioning and interpolation of DEMs
The polygon data of delimitation was created and defined as regions for terrain measurements. We used the basin edges of MERIT-Basins (Lin et al. 2019) or the sea to divide the global area into 67 regions (Fig. 2). Although island regions of MERIT-Basins do not follow the watershed boundary, hereafter, we refer to the regions shown in Fig. 2 as, e.g., "Basin 41, " for simplicity. This region partitioning was performed in consideration of the difficulty of processing the global DEM with a resolution of 3-arcsecond grid data in a single calculation due to our system performance, and the usability of the resultant data was expected to be poor.
The original DEMs within each region were merged and clipped, then transformed into the Lambert Azimuthal Equal Area (WGS84) to produce a 90-m DEM with the coordinates near the center of the land area as the central meridian. Two resampling methods, the nearest neighbor method and the bilinear method, were used to create interpolated DEMs (DEM1, DEM2). We used two different resampling methods to avoid the stripe noise caused by the filtering process used in the subsequent calculation of the terrain volume using Geographic Information System (GIS) software. The noise is attributed to the raster processing method of GIS, and this problem can be avoided by using Triangulated Irregular Network (TIN) as the resampling method of DEMs. However, we did not use TIN because of the computational cost, so we used two resampling methods.

Masking data of NoData
To prevent underestimation of the amount of terrain measurements in areas with abrupt changes, such as the ground between water and mountains, and for product usability, it is necessary to set null (NoData) for land and water areas such as lakes, marshes, and large rivers. However, MERIT DEM does not include information for land and water areas. In a previous study (Iwahashi et al. 2018), the water part of the land cover classification of GlobCover 2009 (Bontemps et al. 2013) was used, but the resolution was coarse at about 300-m, and blank areas due to clouds were unavoidable in the independent peaks of remote islands because the data were sourced from satellite imagery. In this study, HydroLAKES (Messager et al. 2016) was used to mask lakes. HydroLAKES is a database of global polygon data of lakes larger than 10 ha. Some parts that required modification were manually modified and used, although only within Japan. HydroLAKES does not include brackish lakes or large rivers that should be NoData parts in this study, so supplementary data were necessary. Initially, we considered using land cover classification based on satellite imagery, but it was difficult to identify reclaimed land in shallow waters and islands in coral reefs. Therefore, we used the Code2 major rivers and water sections extracted from the OSM Water Layer (Yamazaki et al. 2017). The OSM Water Layer is a published dataset which was created by extracting water-related data from OpenStreetMap.
These masking data were used as NoData masks to minimize errors in areas with rapid changes in slope. These data do not necessarily reflect the actual extent of land water areas, because the criteria for acquisition of Page 4 of 24 Iwahashi and Yamazaki Progress in Earth and Planetary Science (2022)  the water-related data of OpenStreetMap is considered to be disparate. Except in areas where the data of mapping agencies are used as the original data, the data of different areas often exhibit large differences in accuracy or description, and the treatment of the intertidal zone is probably not consistent.

Supplementary data
MERIT-Basins' vector hydrography (based on MERIT Hydro v1.0) unit catchment shapefiles (Lin et al. 2019) was used as auxiliary thematic data for the segmentations. MERIT-Basins, which comprises unit catchment data calculated from MERIT DEM, was combined with polygon data which was segmented as uniform slopes and designed so that the uniform slopes are divided by the ridge lines of unit catchments. The unit catchments of MERIT-Basins, as shown in Fig. 3, are not detailed and are rather broad, which is an advantage for the purpose of this use. For the mountainous areas, they follow ridges well, while terraces and plains-whose geomorphologic boundaries do not follow watershed boundaries-are not sufficiently detailed to adversely affect their classification. Thus, this was performed to alleviate the problem of region segmentation in the previous studies (Iwahashi et al. 2018(Iwahashi et al. , 2021 and to allow users to combine the locations of upstream and downstream catchments in MERIT-Basins as polygon attributes. This process is expected to facilitate the extraction of data for each catchment area and assessments of the connection between upstream and downstream.

Terrain measurements
The terrain measurements used in this study basically followed the geometric signatures of Iwahashi et al. (2021). The 90-m DEMs (DEM1, DEM2) were used to calculate the terrain measurements as attributes shown in Table 1. Following the method of Iwahashi et al. (2018), logarithmic values (ln) of the slope gradient (lnSLOPE) were calculated. This follows a common geomorphological map style in which mountainous steep slopes tend to be aggregated together, whereas plains are divided into details. We also calculated the logarithmic values (ln) of HAND to emphasize the topography of gentle slopes (lnHAND).
For the HAND calculation, Iwahashi et al. (2021) used TauDEM (Tarboton 1997(Tarboton , 2005. However, in this study, due to the system cost associated with the increase in data volume, we applied the HAND calculation method of MERIT Hydro ), a product that has already been hydrologically corrected, and created the data from MERIT DEM. MERIT Hydro uses a unique correction algorithm (Yamazaki et al. 2012) to perform hydrological elevation correction. This method calculates the flow direction from global elevation and water body data, and then minimizes the amount of topographic  The surface texture of Iwahashi and Pike (2007). TEXTURE was calculated by the "terrain surface texture" tool of SAGA (Conrad 2012a) in QGIS 3.4 (Threshold: 5-m, radius: 10 cells, no distance weighting) CONVEXITY Density within a 10-cell radius of convex points obtained by processing the DEM with a 3 × 3 Laplacian filter.
Original DEM: the 90 m DEM interpolated by the nearest neighbor option The local convexity of Iwahashi and Pike (2007). CONVEXITY was calculated by the "terrain surface convexity" tool of SAGA (Conrad 2012b)  Page 7 of 24 Iwahashi and Yamazaki Progress in Earth and Planetary Science (2022) 9:33 correction while maintaining the condition that the downstream elevation is not higher than the upstream elevation. Unlike HAND calculated by TauDEM as used in Iwahashi et al. (2021), these data did not undergo the fill surface depressions (sinks) process first, so it cannot extract lowlands by itself. The fill sinks tool of Wang and Liu (2006) was used separately to extract the number and areas of depressions, denoted as the variable Sinks (Table 1), and place attributes on the polygon data. The method of Wang and Liu (2006) can fill surface depressions relatively quickly, even for the large data of this study. Sinks were extracted by the difference between the filled DEM and the original DEM. Surface texture and local convexity are very noisesensitive parameters. Although the MERIT DEM largely mitigates the unevenness of the source DEMs, the effect of the difference of baseline data or the stripe noise in MERIT DEM became apparent, especially in highlatitude regions and deserts (Iwahashi et al. 2018). This problem also remains in the 90-m-spatial-resolution MERIT DEM. The intensity with which such heterogeneity becomes apparent varies with the resolution of DEMs. The raster images from the terrain measurements in this study showed different patterns compared to the results from Iwahashi et al. (2018) at a 280 m resolution.
In the case of calculating TEXTURE (Table 1), by setting the threshold for extracting ridges and valleys during the calculation to 5-m, we were able to eliminate the effects of unevenness except in a few areas centered on deserts, while retaining useful information such as the distinction between volcanic and non-volcanic slopes. For local convexity, the noise could not be eliminated by increasing the threshold for the extraction of convex points, and expanding the window size for density calculation was an option (Iwahashi et al. 2021). However, in such a case, the overall disadvantage was high, considering the balance between the loss of information commensurate with the 90 m resolution and the fact that local convexity is an auxiliary parameter. Therefore, CON-VEXITY (Table 1) was finally determined to calculate the density with the default radius of 10 cells.

Detection of areas containing significant noise
The areas that contain significant noise in the DEM must be identified. This is required to allow them to be removed from the analysis in advance, to prevent them from affecting the terrain classification results. As noted above, in some areas of CONVEXITY and a few areas of TEXTURE, the unevenness of the source DEMs were reflected. Many of these "noise" regions (hereafter Noise) are along orbital stripes, and the regions which are distributed in TEXTURE alone are in deserts. In addition, noise or low values in lnSLOPE (Table 1) were also observed in the region of deserts or in parts of ice sheets. Within the CONVEXITY, TEXTURE, and lnSLOPE images used for classification in this study, Noise polygons were created as the portion of the image that can be significantly identified by visual inspection of the maximum-minimum stretch. The ice sheet areas were also identified as Noise using lnSLOPE images and World Imagery, which is provided as a base map in ArcGIS Online (ESRI). The forms of Noise that were omitted from the terrain classification is as follows (1) Apparent noise on CONVEXITY.
Types 1 and 2 are found in a wide range of areas. Type 3 is found in the polar regions. Types 4 to 6 are found only in desert areas.

Segmentation
Using the raster data of terrain measurements and supplementary thematic data, the regions were divided into uniform slopes, and polygon data were created. The segmentation method followed Iwahashi et al. (2021). The multi-resolution segmentation (Baatz and Schäpe 2000) method implemented in eCognition (Trimble) was used to segment the data under the following conditions. TEXTURE and CONVEXITY were not used to perform segmentation.
The unit catchment of MERIT-Basins (Lin et al. 2019) and the Noise polygons (2.2.2) were used as thematic layers. Therefore, the uniform slope polygons were divided by the catchment areas as well as by terrain features (Fig. 3). In areas where there was noise, the type is noted in the field attributes. The uniform slope polygons were output to shapefile format data with the representative values for each terrain measurement contained in the polygons. For Australia, which has a very large amount of data, this work was divided into two portions due to the capacity of the system in use. The boundaries of unit catchments of MERIT-Basins were used for the division. The field attributes were edited to add the polygon number (PolyID) and polygon area (area), as well as to copy the ID (COMID) of the unit catchment from Lin et al. (2019).

Regional and global k-means classification
Edited uniform polygons ( Fig. 1) were used to store the representative values of topographic quantities (Table 1).
Using the field attributes of these data, we created a grouping of similarly shaped slopes using the same method for the entire globe, including remote islands and continents. We did so to publish rough, standardized classification data and to help users find unique applications in their own regions. The idea behind the classification is as follows. The method should be generic and not a black box. At least the seed of classification should be unsupervised. This is necessary for two reasons: in practice, generalize training data for a specific region to a wide area with different climatic and topographic formation processes at 90 m resolution is difficult, and we aimed to find unknown features in topographic data rather than simply imitate imitating existing maps imperfectly. Specifically, we used k-means clustering (Mac-Queen 1967), following Iwahashi et al. (2021). Previous studies (Iwahashi et al. 2018(Iwahashi et al. , 2021 have confirmed that k-means clustering can be used to classify landforms into mountainous areas, hills, terraces, fans, and plains, as well as to subdivide them into several categories related to ground strength and geological hazards. In addition, a general method such as clustering makes it easier for users to identify and classify unique regions independently. Each of 67 regions was classified by k-means clustering (MacQueen 1967). Standardized values of the field attributes, lnSLOPE, lnHAND, and TEXTURE, were used to perform the calculations, following Iwahashi et al. (2021). Only the areas in which there were no noise regions in the raster data of these three terrain measurements were used to perform clustering. The area excluded from clustering (Noise codes 0 and 1) comprised about 1.8% of the global land area. The calculations were performed using SPSS v26 (IBM), with the number of clusters set to 15 and 40, the convergence criterion set to 0, and the maximum number of iterations set to 999. Although automatically setting the number of divisions (k) for clustering is difficult, in this study, we set k to 15 for a quick overview and then to 40 so that users can group data samples according to their needs. 40 clusters may be excessively fine for a domain with thousands of cases such as the Pacific countries, but later grouping is possible. The clustering was also performed in batches for the areas where polygons were created by dividing the area (Fig. 2, Basin 56-1 and 56-2). The data with 15 clusters are intended to be used as a simple terrain classification map of the region. The data with 40 clusters are intended to be grouped by users according to local conditions; therefore, the number of clusters was set to a high value.
In the same process as the regional data, we also created a batch of clustering data for the entire globe. This is supplementary data for the observation of maps connecting adjacent Basins and for use in problems that require a unified analysis of the entire globe, such as global environmental problems.
For the global data (43,965,174 cases) and some Basins with many cases (Fig. 2, Basins 62 and 26), the calculation did not converge at the maximum number of iterations (999) for the 40 clusters. However, the maximum absolute coordinate change for any given center was small, almost zero (7.434E-5) for the 40 global clusters, and 0.001 for the 40 clusters of Amazon River basin (Fig. 2, Basin 62), which was the largest.

Dataset to be released
The datasets of this article can be downloaded from the Geospatial Information Authority of Japan (GSI Japan) website [https:// gisst ar. gsi. go. jp/ terra in2021/]. As explained in the previous section, the polygon data are stored in the shape file format, and the attributes include the ID number (polyID) of the segmented polygons, the average values of terrain measurements in the polygon (lnSLOPE, lnHAND, TEXTURE, CON-VEXITY), the mode value of Sinks, the code number of Noise, and the COMID number of MERIT-Basins.  (2) Properties of the cluster file (regional or global cluster; dBASE IV, Cluster_XX or GlobalCluster_XX).

Usage of the dataset and notes
The clustering result (dBASE IV) can be combined with the shapefile (Poly_XX) using GIS software with polyID as the key. The result of the combination is itself a simple terrain classification map. As shown in Iwahashi et al. (2021), the 40 clusters can be grouped by comparing them with existing thematic data and scatter plots of clustering convergence values to construct a terrain classification map. CONVEXITY was not used for cluster classification in this study and is provided only as an auxiliary parameter. In regions where CONVEXITY does not contain noise, it may be used as a supplement to classify valleys and hills in intermediate slopes such as hilly terrain. However, the necessary threshold may vary from region to region. In continental regions, Sinks (Table 1) extracted by the "Fill Sinks (Wang and Liu)" tool using the 90-m DEM cover a wide area of plains. We provide users with both Sinks and lnHAND (Table 1). The lnHAND (= ln (HAND + 1)) can be converted to HAND.
Because the shapefile dataset contains COMID (Lin et al. 2019) as a field attribute, it is considered possible to combine attributes of information on the location of upstream and downstream of each catchment area contained in the MERIT-Basins' river data (Lin et al., 2019) using COMID as a key. The relationship between upstream and downstream in the same unit catchment can probably be determined by the value of lnHAND.
The notes for using the dataset are as follows. The DEM reflects the topography at the time of measurement. Therefore, the classification results are not likely to be the expected ground proxies for artificially altered areas such as reclaimed land or cut-and-fill areas. Because the Noise areas were designated visually, there may be some oversight or overestimation. Terrestrial lands described in OpenStreetMap that were formed after the measurement period of the source DEM of MERIT have an elevation of 0. Therefore, they were classified into the same cluster as low plains. We changed the cluster number to 0 where we noticed the issue, but some may still exist in the dataset.
In some areas of brackish lakes and large rivers, the centerline may not be NoData, but rather narrow polygons due to the OpenStreetMap used to extract these areas. Small lakes that are not included in Hydro LAKES (Messager et al. 2016) and OSM Water Layer (Yamazaki et al. 2017) are classified in the same cluster as low plains. These narrow polygons need to be removed before use. There are often small polygons at the confluences of valleys. These polygons coincide with the blank parts of MERIT-Basins and were caused by combining the segmented polygons and MERIT-Basins. Because the areas were very small, they did not exhibit obvious influence on the clustering results.

General characteristics of the clusters 3.3.1 The global clusters
The data in this study were generated via unsupervised classification by topography from DEM only. The polygon boundaries indicate break lines where the lnSLOPE and lnHAND vary. The clustering results indicate groups of similar shape. In all Basins, slopes belonging to clusters with small HAND, TEXURE, and SLOPE form Page 10 of 24 Iwahashi and Yamazaki Progress in Earth and Planetary Science (2022) 9:33 plains, while slopes belonging to clusters with large HAND and SLOPE form mountains and hills. Clusters with large TEXTURE and SLOPE but small HAND correspond to the valleys of mountains and hills, and some deeply eroded low hills. For mountains and hills, areas with relatively small SLOPE but neither summits nor valleys correspond to lower hilly mountains, hills, and dissected cliff slopes, and were considered as areas of active sediment production (Iwahashi et al. 2021). Clusters with small TEXTURE and large HAND correspond to plateaus, terraces, and sand dunes.  Figure 5 shows color-coded maps of the 15 global clusters (Gcluster15) as polygon attributes around the Amazon River basin (a) located in a humid tropical plateau (Summerfield 1991). From the comparison with the shaded relief (c) of an enlarged view (b), an overview of the landform patterns was described. The mountains and hills were mainly divided into relatively fine textured terrains, large highland slopes, and their associated valleys (Fig. 5b). The nature of the large highland slopes showed regional characteristics, with some regions capturing table mountains as shown in Fig. 5b, whereas others, e.g., in orogenic regions like Japan, are mainly Quaternary volcanic slopes, as will be shown below. In any case, the classification of mountain slopes would be useful for mountain assessment. The data in this study cover not only major terrestrial areas but also remote islands, if the MERIT DEM is present. The 15 classification of global clusters is expected to prove useful for understanding the schematic topography of regions that are difficult to survey in the field, such as tropical rainforests, remote islands, arctic regions, and deserts. If users would like to obtain a more detailed categories of landforms for a wide area, the 40 categories of global clusters should be useful as material.

The regional clusters
It should be noted that the model with a 90 m grid resolution is appropriate to the boundary between the landform pattern and the landform element (Speight 1984). Therefore, the classification results presented in this study should capture even more landform elements than previous global terrain classification of the same type using 280-m DEMs (Iwahashi et al. 2018). This also Page 12 of 24 Iwahashi and Yamazaki Progress in Earth and Planetary Science (2022) 9:33 means that exogenic regional characteristics, such as differences in landform formation processes due to climate, are reflected more clearly. Therefore, in order to use these data to solve regional problems, regional analysis is considered necessary. This is the reason why we also provide regional clusters clustered in the Basins. In Basins where the number of cases was very large, the terrain classification (spatial pattern for each cluster) by the global and regional clusters did not different substantially. However, it is thought that more regionally specific clusters were generated. One of the important findings for regional application is that the spatial pattern for each cluster in this study may reflect the boundaries of surficial geology or soil in some areas, but not in others. This applies equally to both the global clusters and the regional clusters. However, because the regional clusters are likely to be used more locally, we focus on these. Figure 6 shows heat maps of the regional 15 clusters in the alluvial, aeolian, and erosional regions shown in the digital data of the Regolith Map of Australia (Craig 2013), which covers central and northeastern Australia. This region is in a largely arid environment, except along the northern and eastern coasts. The cluster number is specific to Basin 56 (Fig. 1). Figure 7 is a scatter plot of cluster convergence values (ZlnHAND and Ztexture) of the regional 15 clusters. Figure 8 shows a dendrogram of the hierarchical clustering results using the convergence values of ZlnHAND, ZlnSLOPE, and Ztexture for the clusters. The cluster linkage method is the Ward method, and the distance is the Euclidean distance. Terrain groups grouped by Distance 2 are bracketed to the left of the cluster number in Fig. 8. Slopes of 4-12-13-14 and 1-3-11 branches are frequently distributed in Aeolian and Alluvial environments (Fig. 6). They are coarse in TEXTURE and have relatively low HAND and gentle SLOPE (Fig. 7), suggesting that they are probably poorly consolidated sediments. The slopes of branches 1-3-11 are steeper than those of 4-12-13-14, and are relatively less common on the alluvial and more common on the aeolian slopes, suggesting that they are formed of more coarse-grained sediments. Slopes of 7-9-5, 10-15, 6-8, and 2 branches are common on slopes in Erosional environments but rare in Alluvial and Aeolian, and they are plateaus, hills, and mountains. Fig. 6 Heat maps of the regional 15 clusters in alluvial, aeolian, erosional regions of Australia Page 13 of 24 Iwahashi and Yamazaki Progress in Earth and Planetary Science (2022) 9:33 Figure 9a shows a color-coded terrain classification map of the area near Rockhampton, which is in the Eastern Uplands Division (Pain et al. 2011) and has a relatively wet annual precipitation of approximately 1000 mm (Geoscience Australia 2022). The map is a superimposition of the 15 clusters of this study and the 1:5,000,000 Regolith Map (Craig 2013). In Fig. 9, there is a clear contrast between bedrock mountains (mainly formed by branches 10-15 and 6-8-2, including large slopes of 7 and 9 on the cliffs), regions where residual soil rests on the bedrock (mainly formed by branches 1-3-11 and 7-9), and regions of alluvial sediments (mainly formed by branches 4-12-13-14). The region of swamp sediments is not captured in the clustering results, but in the attached Sinks (Fig. 9b), polygons with predominant Sinks can be observed in similar locations. Thus, there is a possibility of meaningful reclassification by overlapping with clustering results. The distribution of Sinks is generally limited in mountainous areas such as this region, with a distribution that strongly suggests an association with low wetlands. The coastal zone sediments vary in slope type, from the 4-12-13-14 branch with Sinks, which is similar to the alluvial sediments, to the 7-9-5 and 10-15 branches, which are equivalent to the large slope of the hills, suggesting the diversity of the coastal zone sediments. Figure 10 shows a similar map of the region around Alice Springs, which is located between the Western Plateau Division and the Interior Lowlands Division (Pain et al. 2011) and has a dry annual precipitation of about 200 mm. Compared to the Regolith Map, the terrain classification of this study shows a similar relationship with the boundary of the erosional landform as in Fig. 9a, but less relationship with the dune fields formed by the aeolian sediments. In the dune fields, the relationship is less pronounced, and the slopes of the 4-12-13-14 and 1-3-11 branches are arranged according to the slopes of individual linear dunes. It is unclear to what extent Sinks are related to water resources due to differences in climate and geology in different regions, but when combined with areas where HAND is small (blue areas in Fig. 11), they may be useful for environmental assessment even in arid areas.
Thus, the terrain type to which each cluster corresponds differs from region to region. The use of these data to solve regional problems requires combined analysis and reclassification using regional information. Page 14 of 24 Iwahashi and Yamazaki Progress in Earth and Planetary Science (2022) 9:33 It is probably necessary to separate the interpretation of clusters by rough physiography and climate zones (e.g., morphoclimatic zones; Summerfield 1991). Regions with abundant data can be interpreted and reclassified considering existing soil and geological information, while hills and mountains, where existing data are generally scarce, can be grouped using scatter plots of convergence values and hierarchical clustering results.

Small rises in the plains and Vs30
The expected benefit of higher-resolution DEMs and the use of HAND is the improved capture of small rises in the plains, such as terraces and sandy rises, which have different soil types from floodplains and coastal plains. In regions where the clusters reflect characteristics of surficial geology and soils, this benefit may be reasonable. However, quantitatively evaluating existing thematic maps is difficult as the correct answer for unsupervised clustering results such as those in this study. This is because the clustering results using DEMs are unsupervised classifications based on topography alone and they are based on a different concept from manual classification maps that consider geology and causation. The difference is even more clear in urban areas, where artificial changes have progressed and information on micro-elevations is often lost (but is still included in thematic maps) in DEMs. Despite these limitations, it is possible to measure the degree of improvement from the previous study by quantitatively examining "how well the target terrain species are separated as a cluster." We compared the result of this study with those of Iwahashi et al. (2018), a global classification map using a 280-m MERIT DEM, for the Kanto region (Tokyo metropolitan area), which includes the world's largest metropolitan area in terms of population within the region and is mostly composed of alluvial plains and terraces. In addition, we compared the result with the terrain classification data of Japan (Iwahashi et al. 2021), which was classified using a Japanese 30-m DEM generated from accurate elevation data including measurements performed by airborne light detection and ranging (LiDAR) systems. Because terrain classification data have applications for estimating the shaking susceptibility proxy (Vs30), the results of this study are compared with Vs30 data (Senna et al. 2013(Senna et al. , 2019 for the Kanto Plain to confirm their characteristics. The plains of the Kanto region, which is the metropolitan area of Japan, are surrounded by mountains and volcanos, and there are wide terraces, silty and clay floodplains, and coastal lowlands dotted with natural levees Fig. 8 Dendrogram of the hierarchical clustering using convergence values for the regional 15 clusters in Basin 56 (Australia) Page 15 of 24 Iwahashi and Yamazaki Progress in Earth and Planetary Science (2022)  Page 16 of 24 Iwahashi and Yamazaki Progress in Earth and Planetary Science (2022) 9:33 and elevated sand bar deposits deposited by sandy soil from past floods. There are also artificial embankments and sand bar deposits scattered throughout the floodplains and coastal lowlands. In the Kanto Plain, 1:25,000 to 1:50,000 scale geomorphological maps based on soil sampler data and aerial photographic interpretation have been produced since the 1960s, including the 1:50,000 Fundamental Land Classification Survey by National Land Agency, and the 1:25,000, Land Condition Map and Landform Classification Map for Flood Control by Page 17 of 24 Iwahashi and Yamazaki Progress in Earth and Planetary Science (2022) 9:33 GSI (available on the GSI Maps; Geospatial Information Authority of Japan 2022), and many geological and geographic surveys have been conducted. Wakamatsu and Matsuoka (2013) compiled a large amount of map information and created an engineering geomorphologic classification map of Japan (here after JEGM), which is a 7.5 by 11.25 arc-second grid map of Japan. In this paper, the 2020 version (V4) included in the Japan Seismic Hazard Information Station (J-SHIS) Map (National Research Institute for Earth Science and Disaster Resilience (NIED) 2020) is used as a reference. The grouping of the regional 40 clusters of Basin 41 (including Japan) is shown in Figs. 12, 13. In this case, we did not cross-tabulate the results with existing geological maps, but simply classified them based on the hierarchical clustering results in Fig. 12. The interpretation of the branches in the Kanto region, which was inferred from JEGM and the seamless digital geological map of Japan (Geological Survey of Japan, AIST 2019), and the color of the legend were added. In addition, the clusters with more than 10% Sinks (three clusters in the plains) were reclassified. This simple grouping using hierarchical clustering of cluster convergence values should be possible for other, less well-documented basins. Figure 13 shows the grouping result on a scatterplot, and Fig. 14 shows JEGM (a) and the three terrain classifications using DEMs for the Kanto region. The terrain group which was created based on the regional clusters in this study ( Fig. 14c) represents the major divisions of mountains, volcanos, hills, terraces, and plains well.
The terraces, i.e., the areas corresponding to the three legends of rocky strath terrace (rarely distributed in the Kanto area), gravelly terrace, and terrace covered with volcanic ash soil in the JEGM, are included in Legends 9, 10, and 11 in this study, Legends 6 and 8 in Iwahashi et al. (2018), andLegends 9, 10, 11a, and11b in Iwahashi et al. (2021). The areas corresponding to natural levees, and marine sand and gravel bars in the JEGM are considered to be included in Legends 12 and 16 in this study and Legend 13 in Iwahashi et al. (2021). In Iwahashi et al. (2018), they are integrated with the floodplain/coastal plain and cannot be contrasted.
It is difficult to distinguish between the eroded part of a terrace (e.g., Legend 10 in Iwahashi et al. (2021)) vs. hills, low terraces and dissected fans, because the DEM terrain classification only classifies slopes with similar topography. In the global data (Fig. 14bc), the high fill along the coast is not classified as NoData, and therefore  Iwahashi and Yamazaki Progress in Earth and Planetary Science (2022) 9:33 is classified in the same category as terraces in the figure. At least in Japan, low plains such as floodplains and coastal plains have undergone urbanization and paddy field development and have been totally artificially modified for residential construction and flood control. And in many cases, micro-elevations have already been lost from the DEM. On the contrary, in some cases, the lowlands have been added with the small rises of fill due to land consolidation and housing construction. Table 2 shows the precision, recall, and F-measure of the legends of the terraces, natural levees, and marine sand and gravel bars in JEGM and the legends of the Fig. 12 Dendrogram of the hierarchical clustering using convergence values for the regional 40 clusters in Basin 41 Page 19 of 24 Iwahashi and Yamazaki Progress in Earth and Planetary Science (2022) 9:33 maps that can be compared. The legend of the maximum area in the grid of JEGM was checked and calculated. High-fill areas along the coast were excluded from the calculation. The baseline value was calculated by randomly deriving the same number of positives as grids in each category of JEGM by SPSS v26 (IBM). The baseline of F-value for terraces was 0.27. The baseline of F-value for natural levees and marine sand and gravel bars was 0.05 because the data exhibited a large bias in the number of positives and negatives. As may be observed in Table 2, the capture of terraces in this study was significantly improved over that of Iwahashi et al. (2018), which used a MERIT DEM coarser by a factor of three, and did not use HAND. In contrast, the F-measure was not much different from that of Iwahashi et al. (2021), which used an accurate DEM that was three times finer and included LiDAR, even though the accuracy of the DEM was significantly different. This may be because this study used unit catchment polygons for the segmentations and there were differences in the calculation method of HAND. Recall was high in this study and Iwahashi et al. (2021), which means that the use of HAND reduces the oversight of terraces. In contrast, a phenomenon of confusion between sand dunes (for example, along the northwest coast of Choshi) and terraces was evident. However, the overall distribution of the terraces is clear, and the data of this study can be considered to provide practical results for the terraces.
As for natural levees and marine sand and gravel bars, they are integrated with the floodplains and coastal plains in Iwahashi et al. (2018) and not detected at all. In this study and in Iwahashi et al. (2021), they were partially detected, indicating an improvement due to the use of HAND. Relatively clear natural levees in the upper reaches and marine sand and gravel bars were detected to some extent. However, the capture status of natural levees in downstream urban centers where man-made alterations are predominant was not good. Although improvements have been made from scratch, additional data on local conditions are considered necessary for urban areas. Figure 15 shows a comparison with Vs30 data (in J-SHIS Map, NIED 2020), which is a site amplification factor for seismic shaking in the Kanto region. Vs30 data are derived from borehole core data by spatial interpolation using geological and geomorphological divisions as proxies (Senna et al. 2013(Senna et al. , 2019. In the Kanto region, high-density miniature microtremor array surveys were Page 20 of 24 Iwahashi and Yamazaki Progress in Earth and Planetary Science (2022) 9:33 conducted in terraces and plains, and accurate data were produced by correction using a deep ground model. A box-and-whisker diagram with averaged Vs30 (AVS30) was created using the terrain type with the largest area as the representative type for each grid (Fig. 15). The points corresponding to high-fill areas along the coast were excluded. In the box-and-whisker diagram, the range of inter-quantile range (IQR) for mountains and hills is similar for Groups 1, 2, and 7, Groups 3 and 4, and Groups 5, 6, and 8, respectively. Therefore, from the comparison with Fig. 13, it appears that Vs30 in mountainous or hilly areas is not related to HAND, i.e., height from erosion reference surface, but to TEXTURE or SLOPE. However, this interpretation should be considered with caution because borehole core investigations in mountains are usually conducted near the valley floor. From the comparison with Fig. 12, among mountains, there is a distinct difference in the IQR of Vs30 between bedrock mountains (Groups 1, 2) and fragile mountains (Groups 3, 4, 5).
In the Kanto region, large highland slopes (Groups 4,5) in the fragile mountains are clearly concerned with mafic Quaternary volcanos (Geological Survey of Japan, AIST 2019). For terraces and plains, the higher the HAND, the higher the AVS30 in most classes, with the exception of Group 14 (hilly valleys). We will wait for further research to determine whether this trend may be observed in other parts of the world.

Conclusions
In this study, the combination of terrain measurements calculated from the 90-m MERIT DEM interpolated from the 3-arc-second grid MERIT DEM v1.0.3 (based on Yamazaki et al. 2017) and MERIT-Basins (Lin et al. 2019) have been used to perform segmentations, and shapefiles containing attributes for each unit slope were created and published from [https:// gisst ar. gsi. go. jp/ terra in2021/]. The shapefiles were created for each of the 67 Basins, and the attributes were polygon ID, lnSLOPE, lnHAND, TEXTURE, CONVEXITY, Sinks, Noise, and COMID of MERIT-Basins. Data discretized by k-means clustering using several terrain measurements was also presented. The clustering results can be linked to the shapefiles by polygon ID, and two types were provided, including one for the entire globe, and another for each Basin.
The 15 clusters can be used for observing an overview of topography such as mountainous areas, plateaus, Page 21 of 24 Iwahashi and Yamazaki Progress in Earth and Planetary Science (2022) 9:33 terraces, plains. The 15 classifications of global clusters will be useful for understanding the schematic topography of regions that are difficult to survey in the field. The 40 clusters can be grouped by statistics, such as the grouping with reference to the hierarchical clustering results of the convergence values of the clusters and the reclassification using Sinks (Fig. 14c), or machine learning for themes related to topography according to the needs of each region to construct meaningful maps. Based on the observations in Australia, the relationship between slope materials and the cluster is clear in some areas and less so in others, so the usage is expected to vary from region to region. In the experiment conducted for the Kanto region of Japan, the terraces in the map constructed from the 40 regional clusters were picked up clearly compared to the reference map, and the data of this study can be considered to provide practical results for the terraces. We were able to obtain the relationship between the Vs30 data and each terrain group on the map of the Kanto region, which shows a clear difference between bedrock mountains and fragile mountains described by the clusters. Of course, Table 2 Precision, recall, and F-measure of the legends of the terraces (a), natural levees and marine sand and gravel bars (b) in JEGM and the legends of the terrain classification maps that can be compared Page 22 of 24 Iwahashi and Yamazaki Progress in Earth and Planetary Science (2022) 9:33 in addition to using the clustering results, users can also make their own classification using the raw terrain measurements stored in the polygons.
The main point to be noted is that the results presented in this study only represent the topography at the time of measurement of the MERIT DEM source data, and any topography that has been rapidly changed since then, for example, by artificial or volcanic activities, is different from the current situation. When using the clustering results as a proxy for slope materials, it is important to note that the results are not as expected in areas that have been cut and filled by human modification. For example, in an analysis conducted in the Kanto region in Japan, natural levees and marine sand and gravel bars, which could not be captured at all in a previous study (Iwahashi et al. 2018), were captured reasonably well in areas with no artificial modifications, but notably less so in heavily modified urban areas. Additional data on local conditions are considered necessary for urban areas.
Recently, a free DEM with a 30 m resolution has been published (Hawker 2022), and the resolution of globallevel DEMs will continue to increase in detail. However, due to scale issues, it is well known that DEM resolution affects the values and spatial patterns of terrain measurements (Zhang and Montgomery 1994;Deng et al. 2007;Mulder et al. 2011), and the relationship between scale, appropriate terrain measurement type, and calculation method is complex. Higher-resolution global data, if it is to go beyond unsupervised classification and to produce a robust geomorphological legend, will most likely need to be undertaken not by individuals, but by international teams with knowledge of geomorphological development in a variety of climatic and physiographic settings.