River discharge prediction for ungauged mountainous river basins during heavy rain events based on seismic noise data

Several mountainous river basins in Japan do not have a consistent hydrological record due to their complex environment and remoteness, as discharge measurements are not economically feasible. However, understanding the flow rate of rivers during extreme events is essential for preventing flood disasters around river basins. In this study, we used the high-sensitivity seismograph network (Hi-net) of Japan to identify the time and peak discharge of heavy rain events. Hi-net seismograph stations are distributed almost uniformly at distance intervals of approximately 20 km, while being available even in mountainous regions. The Mogami River Basin in Northeastern Japan was selected as a target area to compare the seismic noise data of two Hi-net stations with the hydrological response of a nearby river. These stations are not located near hydrological stations; therefore, direct comparison of seismic noise and observed discharge was not possible. Therefore, discharge data simulated using a hydrological model were first validated with gauging station data for two previous rain events (10–23 July 2004 and 7–16 September 2015). Then, the simulated river discharge was compared with Hi-net seismic noise data for three recent events (10–23 July 2004, 7–16 September 2015, and 10–15 October 2019). The seismic noise data exhibited a similar trend to the time series of simulated discharge in a frequency range of 1–2 Hz for the selected events. Discharge values predicted from the noise data effectively replicate the simulated discharge values in many cases, especially the timing and amount of peak discharge. Simulated and predicted discharge near NIED Hi-net seismic stations in the Mogami River Basin for the event of October 2019 (Typhoon Hagibis).


Introduction
River discharge is defined as the total volume of water flowing through a river at any given point; it is an important hydrological parameter for planning and decision making of many water-based development activities. River discharge cannot be measured directly but must be computed by multiplying the area of water in a river cross section with the average velocity of water in that cross section (Herschy 1993). Different measurement methods have been proposed to estimate river discharge, which is generally referred to as the observed discharge. For example, a current meter is typically used to measure the flow velocity at various vertical locations at any point within a cross section. The flow velocity is then multiplied by the corresponding area of each measurement, and the sum of these products gives the river discharge at that selected point. Classical velocity-area methods are based on the principle of the continuity of fluid flow (Buchanan and Somers 1969) and provide instantaneous discharge measurements to establish the stage-discharge relationship (Herschy 1998). This type of discharge measurement is feasible for river basins with accessible topography (Dobriyal et al. 2017), as the discharge at any point along a river should consider the size of the basin, amount of rainfall, topographic features, etc. However, for numerous river basins located in mountainous regions, obtaining discharge measurements is highly challenging and not economically feasible due to their remoteness and complex environments. Therefore, regular monitoring of river discharge is lacking for mountainous river basins, which are also known as ungauged river basins.
Several studies have suggested that the frequency of extreme weather events is increasing as the climate continues to change (Kusunoki et al. 2008;Westra et al. 2014). Specifically, extreme rainfall can cause a sudden increase in river discharge, which is extremely dangerous for those living close to rivers (Georgakakos 2006;Oshikawa et al. 2008) and may result in the large-scale loss of human life and infrastructural damage (Hirabayashi and Kanae 2009;Dottori et al. 2018). To predict and prevent water-related disasters in a river basin, regular measurement of river discharge is crucial, particularly during extreme rain events; this is because it can determine the flow capacity of the river section in response to heavy rainfall (Marchi et al. 2010). There are several river systems within ungauged river basins. A sudden increase in the flow of any tributary of a basin may lead to a severe disaster downstream of the basin in the form of flash flooding, which may be more common in mountainous regions. In some cases, the time lag between rainfall and flooding is short, especially in small river basins. In such a situation, high temporal estimation of river discharge is necessary for estimating the flood disaster level in a downstream section, and appropriate measures should be taken in a timely fashion for disaster prevention and management. Furthermore, information on the amount and timing of peak discharge is also important for assembling comprehensive flood data; however, this is hindered by a lack of actual observations for ungauged river basins (Saharia et al. 2017). Hence, various indirect methods are available for quantifying and monitoring river discharge at given points (Bjerklie et al. 2005;Marchi et al. 2010;Dobriyal et al. 2017;Anthony et al. 2018;Kebede et al. 2020;Shi et al. 2020).
The most common indirect approach is rainfall-runoff modeling, which has been employed to simulate river discharge in river basins around the world. Multiple types of hydrological models are available globally, and there is continuous debate regarding which is the most suitable model for hydrological analysis of a river basin; however, this depends on the specific aims of the study. Currently available hydrological models include VIC (Liang et al. 1994), TOPMODEL (Beven et al. 1995), SWAT (Arnold and Fohrer 2005), RRI (Sayama et al. 2012), and HEC-HMS (Scharffenberg 2016). However, such models are limited by their ability to validate simulated data, especially over ungauged river basins (P.C. et al. 2018a, 2018b). Another concern is the near real-time performance of simulation results. That is, hydrological simulations of river basins should be performed with high spatial and temporal resolution to achieve a high temporal discharge rate within river basins (Ochoa-Rodriguez et al. 2015;Huang et al. 2019;P.C. et al. 2019).
In addition to discharge modeling, various approaches have been developed to estimate river discharge in river basins, particularly during flood disasters. For example, non-contact measurements based on remote sensing data are commonly adopted to estimate river discharge (Dobriyal et al. 2017;Anh and Aires 2019;Kebede et al. 2020;Shi et al. 2020). In a recent study, Bjerklie et al. (2005) extracted river geometries and water velocity from topographic maps and synthetic-aperture radar images, respectively, to estimate the discharge of large rivers. Although satellite observations can be useful for discharge estimates in large rivers, it is generally agreed that additional analysis is required for small rivers in complex topographies (Schumann et al. 2011;Huang et al. 2018). Moreover, the presence of clouds and the temporal resolution of satellite observations can hinder remote sensing observations, particularly during flood events (Clement et al. 2018;P.C. et al. 2020a).
In some cases, hydrological measurements such as those performed using gauging stations may either destroy the channel course or alter the physical channel during river flooding. Therefore, it may not be possible to produce a comprehensive record using in situ measurement of flow under such conditions. To overcome these issues, high-temporal-resolution seismic monitoring instruments have been used to infer hydrological data and understand the timing of peak flow, thereby enabling early flood warnings to be issued (Burtin et al. 2008;Sawazaki et al. 2016;Anthony et al. 2018;Eibl et al. 2020). For example, Burtin et al. (2008) and Sawazaki et al. (2016) reported good correlations between seismic noise data and the water level or discharge of mountainous rivers in Nepal and Japan, respectively, despite a substantial distance between the seismic and hydrological stations. There may be several independent river systems within a relatively small area of a given mountainous river basin. Hence, uncertainties on the correlation may occur at locations where there is a substantial distance between the seismic and hydrological stations. Moreover, most of these studies focused on mountainous regions characterized by complex topography and river networks. Owing to the steep and rugged topography in mountainous regions, small streams with high flow velocity are common. In such situations, the relationship between seismic and hydrological data can only be determined accurately when observations points are close to each other. Anthony et al. (2018) showed that seismic noise data can Page 3 of 17 P.C. and Sawazaki Prog Earth Planet Sci (2021) 8:58 even be used to determine the discharge data of small rivers when the seismological instrument is located very close to a hydrological station. However, such co-located measurements are usually extremely difficult to perform in mountainous regions. As hydrological (Mishra and Coulibaly 2009) and seismological (e.g., Aoi et al. 2020) station networks are designed for very different purposes, they are not typically located in the same places. Indeed, seismic stations may often be located close to small and large rivers in mountainous regions where hydrological observations are rarely available. This is the typical ground truth situation in mountainous regions. The National Research Institute for Earth Science and Disaster Resilience (NIED) deployed a dense and high-sensitivity seismograph network (Hi-net) (Obara et al. 2005;Aoi et al. 2020) that covers several ungauged mountainous river basins. This study employs Hinet to address the relationship between seismic noise data and river discharge for hydrological and seismic stations located far from each other, where a direct comparison of the two types of data is not typically possible. Seismological stations have been operating in the mountainous regions of Japan for some time. It may not be possible to fix or relocate these stations due to economical or technical issues. At the same time, discharge observations near existing Hi-net stations are rarely available. Hence, there is a need to establish the relationship between the seismic noise data of installed stations and the discharge of nearby rivers. To do so, we first perform hydrological simulations of a river basin and subsequently validate the simulation results using gauging station discharge data at the catchment scale. Then, the relationship between seismic noise data and simulated discharge data is analyzed on the ungauged sub-basin scale. This research selects Mogami River Basin (Fig. 1), which is located in Yamagata prefecture, Japan, as a case study. The aim of this study is to demonstrate the application of seismic data for predicting Page 4 of 17 P.C. and Sawazaki Prog Earth Planet Sci (2021) 8:58 the discharge in an ungauged river basin during a heavy rain event.

Data and methods
Approximately, 80% of Japan is composed of hills and mountains. Most rivers in Japan originate in mountainous terrain, and have been serving as a backbone of Japanese life, culture, and agriculture for centuries. Many urban areas in Japan, as well as forested nature reserves, are located near rivers that flow from the mountains where performing hydrological observations may not be possible.
The Mogami River Basin is located in the northwestern region of Japan ( Fig. 1) and is predominantly mountainous and forested, although there is also a small area of urban and built-up land. The elevation varies from approximately sea level to more than 1500 m above sea level. There are some hydrological stations in the main channel of the basin but not in the upper part of the basin. Hence, several sub-basins of the Mogami River Basin do not have any hydrological stations for monitoring river discharge. Fortunately, there are some seismic stations located in the mountainous region of the basin. In this study, we selected a sub-basin of the Mogami River Basin (Fig. 1) where two seismic stations (N.FGTH and N. MGMH) are located. This sub-basin was selected based on its location (i.e., within a mountainous region), its natural flow (i.e., a lack of hydraulic structures such as dams or reservoirs), the presence of at least one Hi-net station, and because it is far from human settlements.
We selected three previous significant heavy rain events in the region: 10-23 July 2004, 7-16 September 2015, and 10-15 October 2019; the third event corresponds to Typhoon Hagibis, which caused heavy rainfall and resulted in severe flooding, landslides, and inundation in several parts of Northern Japan (P.C. et al. 2020a). The Ministry of Land, Infrastructure, Transport and Tourism (MLIT) updates the river stage-discharge relationships each year and publishes discharge data through the MLIT website, which can then be downloaded freely. However, the availability and continuity of discharge data vary among stations. In the case of Mogami River Basin, hourly discharge data for the first two events were collected from three stations within the basin (Fig. 1) that stored long and continuous discharge data records: Usugasawa (G01), Shimizu (G02), and Koide (G03). G01 represents the downstream boundary of the basin, G02 represents the middle of the basin and is close to selected sub basin, and G03 represents the upstream boundary of the basin. The maximum hourly discharge of the first two events according to the G01 hydrological station, which is located close to the outlet of the basin (Fig. 1), was approximately 5234 m 3 /s and 2950 m 3 /s, respectively.
However, for the third event, discharge data were not available as of June 2021. Hourly radar rainfall data were collected from the Japan Meteorological Agency (JMA) for all the three events. The spatial and temporal resolutions of rainfall data were 1 km and 1 h, respectively. An explanation of the other data required for hydrological modeling, such as topographic data, is provided in "Hydrological model" and "Seismogram data processing" sections.

Hydrological model
Various types of hydrological models are available globally. Some models need to be purchased, while others are open access for research purposes. There has been continuous debate as to which model is most suitable for the hydrological analysis of a river basin. However, the selection of a hydrological model depends on the specific aims of the study. We considered some important points while choosing the model in this study. For example, the selected model should be conducive to the use of radar rainfall data with very high temporal and spatial resolutions. The computation time should be as short as possible. The simulated discharge should possess a high temporal resolution in gridded format, and the model should be available freely for research purposes. To fulfill the objective of the study, we adopted RRI for performing the hydrological modeling of the Mogami River Basin in this study.
The RRI is a two-dimensional model that has the advantage of simultaneously modeling runoff and flood inundation (Sayama et al. 2012(Sayama et al. , 2020P.C. et al. 2020a, 2020bNguyen et al. 2021). This model is applicable to rainfall-runoff analysis and inundation profiling in each grid of a river basin. The flow in each grid cell is calculated with a 2D diffusive wave model, whereas channel flow is calculated with a 1D diffusive wave model. Topography and meteorological data are the main data inputs to the RRI model, and the outputs include discharge, river water level, and inundation level. Detailed mathematical explanations of the RRI model have been reported in several studies (Sayama et al. 2012;P.C. et al. 2020a;Nguyen et al. 2021).
Topographical features such as flow accumulation (ACC), flow direction (DIR), and digital elevation model (DEM) data are essential for implementing the hydrological model in a given river basin. Different types of global topographic data are available free of charge; e.g., the multi-error-removed improved-terrain (MERIT) DEM (Yamazaki et al. 2019) and HydroSHEDS (Lehner et al. 2013). However, each dataset contains uncertainties regarding the topographic features. In this study, we used MERIT DEM data, which is a global hydrography dataset developed based on the MERIT DEM and multiple Page 5 of 17 P.C. and Sawazaki Prog Earth Planet Sci (2021) 8:58 inland water maps. It contains DIR, ACC, and hydrologically adjusted DEM (Yamazaki et al. 2019). These data cover a spatial resolution of three arcsecs (∼90 m) and exhibit fewer uncertainties than those offered by other global topographic datasets. The RRI model was set up to simulate discharge in each grid of the Mogami River Basin. It should be noted that the total area of the river basin depends upon the selection of the outlet point. The total area of the Mogami River Basin according to the selected outlet is approximately 6930 km 2 . The RRI model is based on a grid system, and implementing a grid application with many cells can result in a time-consuming simulation. Therefore, to ensure rapid hydrological simulations of the selected events in such a large river basin, the default grid size was upscaled two times from the default topographic data during model setup. Hence, the grid size after adjustment for ACC, DIR, and DEM was approximately 180 m.

Seismogram data processing
NIED routinely examines the mechanical properties of the Hi-net instruments and performs quality checks on Hi-net data (Obara et al. 2005;Aoi et al. 2020). The objective of Hi-net is to detect the weak ground shaking caused by micro-earthquakes and survey seismic activity, elucidate the mechanisms of earthquake generation, and determine the subsurface structure of Japan. For this purpose, Hi-net seismic stations are distributed almost uniformly in both plain and mountainous regions, with station-to-station distance intervals of approximately 20 km. Furthermore, to avoid ambient seismic noise due to social activity (e.g., traffic noise), Hi-net seismometers are installed at the bottom of boreholes at depths exceeding 100 m. Hi-net ground motion data are continuously recorded in 100-Hz sampling intervals, transferred to the data center, and made openly available online in nearly real time. Hi-net has been operating since 2002; thus, seismic noise data during the three target events are freely available for research purposes.
We first collected the vertical component of seismograms (oscillation along the up-down direction) at two Hi-net stations (N.FGTH and N.MGMH, Fig. 1) during the target events. The power spectrum of the seismogram was computed using the Fast Fourier Transform algorithm every 1 min, and the average power in the frequency range of 1-2 Hz was computed. The selection of this frequency range is discussed in detail in "Comparison of simulated discharge with seismic noise data" section. Then, we selected the minimum 1-min power among continuous 10-min records from 5 min before every hour to 5 min after every hour. Since the typical duration of the signal from an earthquake or a landslide is shorter than 10 min (e.g., Okuwaki et al. 2021), this "clipping" procedure effectively removed the transient signals that would otherwise interfere with the purpose of this research. In this way, the hourly noise power record was obtained for the target time period. This record was then compared with nearby discharge records simulated in an ungauged sub-basin.

Assessment tool
Assessment tools are very important for understanding the difference between simulated and observed data and are often selected based on specific objective and goals. The Nash-Sutcliffe efficiency (NSE, defined by Nash and Sutcliffe 1970) is the most widely used tool for evaluating hydrological models using observed data. Mathematically, the NSE is given as where Q i sim , Q i obs , and Q obs are the simulated discharge at time i, observed discharge at time i, and average of the observed discharge over time, respectively. NSE varies from − ∞ to 1, with NSE = 1 being the optimal value for the evaluation test.
We also used the Kling-Gupta efficiency (KGE, Gupta et al. 2009), which is based on the mean, standard deviation, and correlations of observed and simulated data; KGE is being increasingly used for evaluations.
Here, r and σ are the correlation and standard deviation between the observation (obs) and simulation (sim), respectively, and Q sim is the mean simulated discharge for the given period. Like NSE, KGE = 1 indicates a perfect agreement between the simulations and observations. The minimum thresholds of NSE and KGE for good performance depend upon the cases and research fields in question. From a hydrological perspective, if NSE and KGE values exceed 0.6, the modeling result is considered to be 'good' (Moriasi et al. 2007;Towner et al. 2019).

Results
The first step of this study involved implementing the hydrological model for the entire Mogami River Basin. Then, the simulated discharge was compared with the observed discharge at several stations in the basin for previous heavy rain events. After comparing the simulation results, our objectives were concentrated in an ungauged sub-basin of the study area, where simulated discharge data and seismic data were compared Page 6 of 17 P.C. and Sawazaki Prog Earth Planet Sci (2021) 8:58 and analyzed. The results of each step are summarized in "Hydrological simulation of Mogami River Basin"-"Discharge prediction from noise data. "

Hydrological simulation of Mogami River Basin
First, hydrological simulations of Mogami River Basin were performed for the selected events. Then, the simulated discharge values were extracted from nearby grid points of the selected hydrological stations (Fig. 1) of the Mogami River. Figure 2 shows the simulated and observed discharge at gauging station points G01, G02, and G03 along the Mogami River for the period of 10-23 July 2004 (first heavy rain event). The NSE values for the observed and simulated data of all three stations were 0.71, 0.65, and 0.69, respectively. Similarly, the KGE values for the same stations were 0.82, 0.82, and 0.66, respectively. Overall, a good correlation was observed between simulated and observed discharge. However, certain biases were observed at some points throughout the time period of each event, especially for the lower discharge rate, which could have been caused by physical properties and model uncertainties (Moges et al. 2021). These uncertainties are not discussed in detail in this study, as our target was to correlate the peak simulated discharge results with the observed discharge data. The NSE and KGE values were greater than 0.6 in all three stations for the first event, indicating that the model performed well and that the simulated results can be considered acceptable for the Mogami River Basin.
To further confirm the model performance, we performed hydrological modeling for the event of 07-16 September 2015 (second heavy rain event). The time series of simulated and observed discharge at the hydrological stations is shown in Fig. 3. The average values of NSE and KGE for the three stations were 0.76 and 0.64, respectively, which implies that the performance of the simulation is as good as that observed for the first heavy rain event. In terms of the timing and value of peak discharge, the simulated discharge data were comparable to the observed discharge data. Therefore, it is confirmed that the model performance of the simulated discharge values for the Mogami River Basin was reliable. One of Page 7 of 17 P.C. and Sawazaki Prog Earth Planet Sci (2021) 8:58 the benefits of the RRI model is that simulated discharge can be extracted for any grid within the ungauged subbasin and used for further analysis.

Comparison of simulated discharge with seismic noise data
First, we selected points on the main river channel of the sub-basin that are close to each seismic station (Fig. 1). The distance from the selected center points of the river grid to N.FGTH and N.MGMH is approximately 240 m and 280 m, respectively. The extracted simulated discharge for the selected two points was then compared with noise data from the seismic stations. For the N.MGMH station, another small tributary is located closer to the seismic station (Fig. 1), which may also excite seismic noise. However, in this study, we considered that the contribution from the main channel was more significant than that from the small tributary because of the offset effect between the amount of discharge and distance to the seismic station.
Figures 4 and 5 depict the raw and clipped power spectra of the seismogram recorded at N.FGTH and N.MGMH for the event of September 2015, respectively, where the clipped spectra were calculated using the procedure explained in "Seismogram data processing" section. Transit signals, which can be identified as vertical lines in Figs. 4a and 5a, were effectively removed in the clipped running spectra (Figs. 4c, 5c). The spectral shape shows clear differences among the time periods of daytime, midnight, and peak streamflow at both stations (Fig. 4b, d, 5b, d). For all the selected periods, the noise power has the strongest peak around 0.2 to 0.3 Hz as secondary microseism excited the ocean waves (e.g., Ardhuin et al. 2019) at both stations. Although the energy offered by the secondary microseism tends to decrease as the frequency increases, this power is still high at frequencies lower than approximately 1 Hz. In the 1-2 Hz range, an increase in noise power is identified only for the period of peak streamflow (spectra shown by red), especially at N.MGMH. At frequencies higher than 2 Hz, although the noise power during peak streamflow is  Fig. 3 Time-series profile of simulated discharge by RRI and observed discharge at three hydrological stations (G01, G02, and G03) for the event of 7-16 September 2015 Page 8 of 17 P.C. and Sawazaki Prog Earth Planet Sci (2021) 8:58 higher than that in other periods, the difference between daytime (spectra drawn in black) and midnight (spectra drawn in blue) becomes significant. This difference arises due to the cultural noise that reflects social activities such as traffic and machinery, which is typically high in the daytime and low at night (e.g., Mcnamara and Boaz 2019). Considering these characteristics of the observed spectra, we consider that the 1-2 Hz range is dominated by streamflow. Therefore, using this frequency range is optimal for our purpose, and is the main reason why the average power of this frequency range was selected for further analysis.
To elucidate the application of noise data to flood events, the results for the events of July 2004, September 2015, and October 2019 are shown in Fig. 6, where the last event was caused by Typhoon Hagibis. In these events, the high-discharge peaks were highly synchronized with the 1-2 Hz noise peaks at both stations. However, the noise power was not highly synchronized with the discharge peaks when the peak was relatively low, especially for the 2004 event. One reason for this could be that the simulated discharge did not always closely fit the observed discharge. As shown in Fig. 2, the simulated lower peaks tended to exceed the observed lower peaks from 13-15 July 2004. It should be noted that the noise power was compared to the simulated discharge in this study, and not the observed discharge. Contamination of the secondary microseism and cultural noise in 1-2 Hz range may be another reason for this poor synchronization.

Discharge prediction from noise data
A correlation was confirmed between the simulated discharge and noise power, especially for the maximum peak discharge during the event period. As mentioned in the previous section, this trend was not necessarily b Spectra at the three time periods shown in (a), where black, blue, and red spectrum are obtained in the daytime, midnight, and peak streamflow periods, respectively. c Running spectra of the seismogram after "clipping" transient peaks (e.g., earthquakes). d Spectra at the three time periods shown in (c). The gray dashed lines in each panel represent the spectral window used for our analysis (1-2 Hz) P.C. and Sawazaki Prog Earth Planet Sci (2021) 8:58 observed for cases of lower discharge. However, as this study aims to understand the timing and peak discharge in an ungauged river basin during heavy rain events, this reduced correlation at lower discharge values is not a serious problem. Considering the observed correlation, we modeled the predicted discharge, Q prd , and the noise power, E, as follows: where A, b, and E 0 are the control parameters. By fitting Eq.
(3) to the simulated discharge, we estimated the parameter set for each event and seismic station. We note that Eq. (3) is derived empirically, and is not based on solid physical background characteristics relevant to noise excitation. The physical background of the noise excitation process is discussed in "Seismic noise data for hydrological applications" section. We also note that E 0 was fixed to the average noise power during the last two days before each flood event. According to Eq. (3), E 0 indicates the background noise power derived from sources other than the streamflow. Since this value may vary with time, a best-fit E 0 -value obtained for one event is not necessarily applicable to another event. Therefore, we optimized only the A and b values for each event and station by maximizing the NSE value given by Eq. (1). Moreover, it should be noted that the simulated discharge was considered as the reference data for the comparison; this implies that Q i sim and Q i obs of Eq.
(1) were represented as the predicted discharge ( Q i prd ) and simulated discharge ( Q i sim ), respectively, in the optimization procedure.
Time seris data of observed noise density (1-2 Hz), simulated discharge, and predicted discharge at/near b Spectra at the three time periods shown in (a), where black, blue, and red spectrum are obtained in the daytime, midnight, and peak streamflow periods, respectively. c Running spectra of the seismogram after "clipping" transient peaks (e.g., earthquakes). d Spectra at the three time periods shown in (c). The gray dashed lines in each panel represent the spectral window used for our analysis (1-2 Hz) P.C. and Sawazaki Prog Earth Planet Sci (2021)   Page 11 of 17 P.C. and Sawazaki Prog Earth Planet Sci (2021) 8:58 F.FGTH and N.MGMH stations during the event of 10-23 July 2004, 7-16 September 2015, and10-15 October 2019 are given in Additional file 1. In Fig. 7, we show the results of fitting for each event and station. To assess the variation in the control parameters for different events, distinct curves were fitted to each event using different colors. The estimated parameters are summarized in Table 1. Overall, although the fitting curves capture the general trend of the noise power-discharge relationship, significant event dependence was observed for the fitting curves and the estimated A and b values due to the large scatter of the simulated discharge.
To understand the uncertainty of the estimated parameters in this optimization, we visualized areas with relatively high NSE values (Fig. 8). The areas with high NSE values exhibit very sharp and long peaks, and a strong correlation between A and b values is observed. Even though the best fit parameters differ by events, these values seem to be indistinguishable when considering their uncertainty ranges, especially for N.FGTH. Conversely, the difference between stations is distinguishable because the background seismic noise level, topography of river cross sections, and various other factors differ by location.
To further examine the validity and predictability of discharge values, we computed Q prd using various combinations of estimated parameter sets and observed noise energies. As there were three parameter sets and three noise energies from the 2004, 2015, and 2019 events, there were a total of nine combinations of Q prd for each seismic station. Table 2 presents the NSE values for each case. Figure 9 provides a comparison of simulated and predicted discharge using the parameter sets from all three events.
The NSE values were generally high (> 0.70) for the 2015 and 2019 events at the N.FGTH station. Notably, some predicted discharges exhibited high NSE values, even when calculated using a parameter set estimated for a different event. For example, the 2019 discharge data were predicted quite well using the 2004 and 2015 parameter sets (NSE is 0.86 and 0.84, respectively). Regardless of the parameter sets used, discharge values predicted for the 2015 and 2019 events at N.FGTH matched the simulated discharge closely, particularly the timing and amount of peak discharge. This is because the three parameter sets obtained for different events are indistinguishable when considering the uncertainty range (Fig. 8) for N.FGTH. These results confirm the good prediction performance of the model by using parameter sets estimated for other events. An acceptable prediction performance was also observed for N.MGMH when the Page 12 of 17 P.C. and Sawazaki Prog Earth Planet Sci (2021) 8:58 parameter set estimated for the 2015 event was used to predict discharge during the 2019 event (NSE = 0.61). However, the prediction performance was much poorer in some cases. For the 2004 event, even the predicted discharge values modeled by the parameter set of the same event exhibited poor predictability (NSE of 0.52 and 0.51 for N.FGTH and N.MGMH, respectively). Nevertheless, the timing and amount of peak discharge were predicted well by all parameter sets for the 2004 event at both stations (Fig. 9a) despite the low NSE values, which is useful for disaster response. It should be considered that the simulated and observed discharge values do not match perfectly (Figs. 2, 3). Therefore, it is possible that the observed discharge near the stations differed significantly from the simulated discharge for the 2004 event.
For the 2015 event at N.MGMH station, the timing of the peak was predicted well; however, the height of the peak was considerably overestimated when using the 2004 and 2019 parameter sets, and this resulted in very low NSE values (− 1.1 and − 3.1, respectively). The cause of this overestimation was the spiky increase in noise power around 00:00 on 11 September (Fig. 6). We checked the raw seismogram around this time and confirmed that it shows a gradual increase and decrease within a few hours. Its appearance is very different to that of earthquakes or landslides. The timing of this signal is almost midnight, and this fact negates the possibility of contamination from cultural noise because it must be strong in daytime only. Therefore, only the possibility of human encroachment near the N.MGMH station (e.g., the release of a dam) remains. In fact, the peak of N.FGTH was delayed than that of N.MGMH, and its shape is smoother. Although we have no direct evidence of human encroachment, the apparent behavior of noise power gives us a hint of this possibility. Human encroachment is always challenging to address when simulating entire periods using simplified hydrological models. This issue is discussed further in "Discussion. " Debris flow is also a potential cause, but we did not find any evidence that supports this possibility.   Page 13 of 17 P.C. and Sawazaki Prog Earth Planet Sci (2021)   Page 14 of 17 P.C. and Sawazaki Prog Earth Planet Sci (2021) 8:58

Discussion
Several previous studies have reported a good correlation between observed discharge and seismic noise data (Burtin et al. 2008;Sawazaki et al. 2016;Eibl et al. 2020). However, hydrological and seismic stations are normally located far from each other. Therefore, this study compared simulated river discharge data with the noise data of nearby seismic stations in the frequency range of 1-2 Hz in the Mogami sub-basin. Here, we discuss the reliability of simulated discharge data and seismic noise data, while elucidating the relationship between the two.

Reliability of simulated discharge data
Although global renowned hydrological models such as HEC-HMS and SWAT have been applied to various types of river basin, discharge is predominantly simulated at the basin or sub-basin scale (Arnold and Fohrer 2005;Beven et al. 1995;Liang et al. 1994;P.C. et al. 2018a).
The most important feature of the RRI model is that it is based on a grid system so that simulated discharge can be extracted at any grid point in the Mogami River Basin. Hence, simulated discharge could be extracted for rivers close to seismic stations in this study. Fully distributed hydrological models may also be applied to river basins where various types of hydrometeorological data are available; however, the performance of these models can be poor in ungauged river basins due to a lack of input variables and physical parameters (Tegegne et al. 2017;P.C. et al. 2018a;Moges et al. 2021). Therefore, simplified hydrological models are more attractive as they are easy to implement in a short time and require fewer input data. However, each type of hydrological model has its advantages and disadvantages. Therefore, a good relationship may not always occur between observed and simulated data. For example, the spiky noise peak observed at N.MGMH for the 2015 event may reflect such a discrepancy. Human encroachment on river courses, and undefined hydraulic properties of the river may also cause discrepancies between observed and simulated discharge values. There are some river dams, especially in the middle and upstream parts of the basin, that were not included during the model setup and may produce some uncertainties in the simulated discharge data. Observed discharge data are extremely limited for remote and complex river basins; therefore, it may not be possible to compare model results for several sub-tributaries within a basin. In this study, it is believed that reliable discharge values were simulated from the RRI model for sub-tributaries within the Mogami River Basin. Uncertainties in discharge values could have been derived from uncertainties in the physiographic data (Yamazaki et al. 2019) or model properties (Gupta et al. 2012).
In this study, JMA radar rainfall was used as the main input in the simulation. To minimize the error associated with estimating rainfall, the JMA constantly updates the radar data with data available from Automated Meteorological Data Acquisition System (AMeDAS) rain gauge stations. Therefore, we did not check the quality of radar rainfall data in this study. The greatest advantage of radar rainfall data is the high spatial resolution (P.C. et al. 2019). However, the quality of radar rainfall data may be reduced over mountainous regions (P.C. et al. 2016), which may increase the bias on simulated hydrological data. Therefore, for proper flood disaster prevention and management, the use of hydrological data with a high temporal resolution may be the best option, particularly during extreme events. However, hydrological simulation of ungauged river basins has been neglected for various reasons, such as the unavailability of observed hydrological data and the time lag between rainfall and flooding, which is very short in small river basins. Meanwhile, simulated output data basically depend upon the input data and model type. Physical distributed models require various types of input data, which require a long computation time to produce output data. Even if one wants to use high-resolution temporal and spatial inputs or obtain output data in the same format from a simplified hydrological model, the computation time may increase. Therefore, most hydrological models may not be suitable for the consideration of input and output data with a high temporal resolution for long timespans or near real-time simulation.
The remote sensing approach may also be used to estimate river discharge. This is one of the more challenging tasks, especially in mountainous regions during heavy rain events. The effect of dense vegetation on the river course, shadow and layover effect on an area of interest, location presence of cloud coverage, and size of the river course itself are extremely common phenomenon that can easily cause gaps in hydrological data collection during regular monitoring, particularly in heavy rain events (Schumann et al. 2011;Huang et al. 2018;Clement et al. 2018). To overcome such gaps, seismic noise data can be a bridge not only for the regular monitoring of river discharge, but also for validating discharge data obtained from remote sensing data.

Seismic noise data for hydrological applications
According to our analysis of seismic noise and river discharge data for a mountainous region of Japan, river discharge prediction using seismic noise data is a promising technique. By estimating the parameter set used for predicting peak discharge from noise power, we may be able to monitor real-time river discharge using seismic noise recorded at nearby stations, even in locations where Page 15 of 17 P.C. and Sawazaki Prog Earth Planet Sci (2021) 8:58 performing direct discharge observations is practically impossible.
Recorded seismic noise data filtered into the 1-2 Hz frequency band were confirmed as the optimal band to conduct a comparison with the flow of a nearby river in this study. This frequency band contains relatively lower power due to secondary microseisms and cultural noise. These background noise properties are important to improve the prediction performance of river discharge in mountainous regions, where direct discharge data are extremely limited.
However, we should note that the appropriate frequency range may depend on time and location. For example, noise from secondary microseisms is particularly high during typhoons. Additionally, cultural noise is very strong over a wider frequency range if the seismometer is installed closer to traffic. As such, the usable frequency range may depend on location and time. In fact, Anthony et al. (2018) deployed a small array of seismometers close to a river, with sensors emplaced at depths of less than 1 m from the surface; consequently, it was observed that seismograms at frequencies of less than 1 Hz closely matched the discharge trends for a small river. Thus, the type of seismometer and installation conditions (Hi-net sensors are installed at depths of more than 100 m) also influence the dominant frequency of seismic noise.
It is also important to consider the physical background on the streamflow noise excitation process. Thus far, bedload transport and turbulent flow are considered influential causes and have been modeled by Tsai et al. (2012) and Gimbert et al. (2014), respectively; these models indicate different frequency dependencies in noise excitation. Burtin et al. (2008) noted that the frequency dependence of the noise excitation process may depend on the season or month of the year. However, in our view, there is no decisive consensus regarding the dominant mechanism of the noise excitation process, and it seems to be premature to apply the existing models to our observation without any consideration of diversity of river flow conditions. This is why we adopted the simple empirical relationship given by Eq. (3) in this study. The different conditions that we can apply using this empirical relationship will be researched in a future work.
To improve the proposed noise-based discharge prediction technique, two major factors should be considered. One is the discrepancy between the simulated and observed discharge values, as discussed in "Reliability of simulated discharge data" section. Unless the discharge record is obtained close to the seismic station, hydrological simulations are required to obtain discharge values at a nearby point. As the simulated and real discharge values do not always match sufficiently, it is necessary to develop more accurate and precise simulation methods. Another factor is the validity of the discharge-noise relationship given by Eq. (3). This simple empirical model assumes that the A and b values are constant and specific to the location. This assumption fails if there are changes in factors such as the riverbed topography, sediment conditions, and the installation environment of the seismometer. Variable model parameters indicate that the parameters estimated from past events cannot be applied for prediction purposes.
Currently, our tentative conclusion is that a good prediction performance can be achieved on a case-by-case basis, as shown in Fig. 9. Nevertheless, it is promising that the timing and size of high-discharge peaks could be adequately predicted from seismic noise in many cases, even though the parameters have not been robustly estimated at this stage. Thus, further research should continue to improve the prediction performance of this technique by examining multiple events in mountainous ungauged river basins worldwide.

Conclusion
Discharge data for mountainous rivers are particularly important for water resource management and related scientific research. Moreover, recent flood events in mountainous river basins in Japan have resulted in significant loss of life and damage to infrastructure. Therefore, estimating the peak streamflow of rivers during flood events is crucial for flood disaster prevention and management. Although performing direct observations is the preferred method for estimating discharge data, it is not practical due to the complex topography and remote environment of mountainous regions. Therefore, the use of indirect approaches such as hydrological simulations and remote sensing technology has been considered to estimate river discharge, especially during extreme rain events. However, each approach has advantages and disadvantages.
In this study, we proposed the use of Hi-net seismic observations to predict the timing and amount of peak discharge in a mountainous sub-basin of the Mogami River Basin during heavy rain events. Hi-net has a good network coverage throughout the mountainous regions in Japan. Discharge data simulated by the RRI model was validated using gauging station data for two independent rain events, and then compared with seismic noise data from nearby stations for three heavy rain events. The results indicated that seismic noise data in the frequency range of 1-2 Hz exhibited a similar trend to the simulated discharge values.
We conclude that our study makes a significant contribution to the literature because performing direct observations of river discharge is often impossible in ungauged,