### 3.1 Maximum likelihood estimation of the model parameters

For a sequence of \(n + 1\) tremor events that occurred at time \(t_{0} , \ldots ,t_{n}\), *n* inter-event times \({\Delta }t_{i} = t_{i} - t_{i - 1} (i = 1, \ldots ,n)\) are assumed to obey the mixture distribution. Therefore, the likelihood of the five parameters, \(\mu_{{\text{L}}} ,\alpha ,\mu_{{\text{S}}} ,\sigma ,\;{\text{and}}\;\phi\), is given as either:

$$\begin{aligned} L\left( {\mu_{{\text{L}}} ,\alpha ,\mu_{{\text{S}}} ,\sigma ,\phi } \right) & = \prod\limits_{i = 1}^{n} {\left[ {\frac{\phi }{{\sqrt {2\pi } \sigma {\Delta }t_{i} }}\exp \left\{ { - \frac{{\left( {\ln {\Delta }t_{i} - \ln \mu_{{\text{s}}} } \right)^{2} }}{{2\sigma^{2} }}} \right\}} \right.} \\ & \quad \left. { + \left( {1 - \phi } \right)\sqrt {\frac{{\mu_{{\text{L}}} }}{{2\pi \alpha^{2} {\Delta }t_{i}^{3} }}} \exp \left\{ { - \frac{{\left( {{\Delta }t_{i} - \mu_{{\text{L}}} } \right)^{2} }}{{2\mu_{{\text{L}}} \alpha^{2} {\Delta }t_{i} }}} \right\}} \right]{,} \\ \end{aligned}$$

(4)

or the log-likelihood:

$$\begin{aligned} \ln \;L\left( {\mu_{{\text{L}}} ,\alpha ,\mu_{{\text{S}}} ,\sigma ,\phi } \right) & = \sum\limits_{i = 1}^{n} {\ln \left[ {\frac{\phi }{{\sqrt {2\pi } \sigma \Delta t_{i} }}\exp \left\{ { - \frac{{\left( {\ln \Delta t_{i} - \ln \mu_{s} } \right)^{2} }}{{2\sigma^{2} }}} \right\}} \right.} \\ & \quad \left. { + \left( {1 - \phi } \right)\sqrt {\frac{{\mu_{L} }}{{2\pi \alpha^{2} \Delta t_{i}^{3} }}} \exp \left\{ { - \frac{{\left( {\Delta t_{i} - \mu_{L} } \right)^{2} }}{{2\mu_{L} \alpha^{2} \Delta t_{i} }}} \right\}} \right]. \\ \end{aligned}$$

(5)

The maximum likelihood estimate based on this likelihood was obtained via the expectation–maximization (EM) algorithm (Dempster et al. 1977), and the estimation error was evaluated using the bootstrap method (e.g., Efron and Tibshirani 1993). One thousand bootstrap samples were resampled from \(n\) inter-event times \({\Delta }t_{i}\), allowing for overlap, and the covariance of these parameters was calculated.

This estimation method was applied to all groups. Figure 2 shows examples of the results for selected groups, and Additional file 1: Fig. S2 shows the results for all groups. A list of the estimated parameters is given in Additional file 2: Table S1. Reasonable estimates of \(\mu_{{\text{L}}}\) and \(\mu_{{\text{S}}}\) were obtained for most of the groups, as shown in Fig. 2a, b. The standard error of \(\log_{10} \mu_{L}\) is ~ 0.1, and the correlation between parameters is small in many cases (Additional file 1: Fig. S3a, b). However, there are some exceptions. The value of \(\mu_{{\text{L}}}\) was estimated to be close to \(\mu_{{\text{S}}}\) in the westernmost part of the Shikoku region, where many tremors occur (Fig. 2c). The bootstrapping results (Additional file 1: Fig. S3c) indicate that the solution varies among several local minima and that the variation in \(\mu_{{\text{L}}}\) is larger than that in \(\mu_{{\text{S}}}\).

The example in Fig. 2d points to the potential of more than three peaks in the histogram of inter-event times within a given tremor group, with this group possessing four peaks (several minutes, one hour, one day, and ten days). The variation in \(\mu_{{\text{S}}}\) is larger than that in \(\mu_{{\text{L}}}\) for this group (Additional file 1: Fig. S3d), and the other parameters are also poorly determined. One possible explanation for the observation of multiple peaks is that a given group may contain multiple tremor regions with different characteristic time constants due to coarse spatial grouping, such that multiple tremor regions with different characteristic time constants are included in the same group. Another possibility is that a given group may possess several inherent periodicities. We note that clarifying the validity and significance of multiple inter-event times within a given tremor group may be important for understanding tremor mechanisms; however, we have simply excluded such groups from our discussion because there are not many of these groups in our analysis.

When the distribution of inter-event time is quite different from that for the mixture distribution assumed in this study, the standard error of the parameters is very large. Therefore, the results for such groups, which possess a standard error of > 0.2 in either \(\ln \mu_{{\text{L}}}\) or \(\ln \mu_{{\text{S}}}\), were not used in the analysis in Sects. 4 and 5.

### 3.2 Hazard rate in the renewal process

Here, the tremor sequence is modeled as a renewal process, where the probability density of the inter-event time is represented by \(f_{{{\text{total}}}} (t)\). Therefore, the hazard rate of tremor occurrence at lapse time \(t\) since the last tremor event is given by (e.g., Matthews et al. 2002):

$${h(t) = \frac{{f_{{{\text{total}}}} (t)}}{1 - F(t)},}$$

(6)

where

$${F(t) = \int\limits_{0}^{t} {f_{{{\text{total}}}} (t^{\prime} ){\text{d}}t^{\prime} .} }$$

(7)

The hazard rate that corresponds to the histogram in Fig. 3a is shown in Fig. 3b, c. Immediately after a tremor occurs, a subsequent tremor is likely to occur. However, the rate decreases after this period, such that a subsequent tremor is unlikely to occur; the hazard rate then increases again due to the loading modeled by BPT. Figure 3d, e shows the variations in the hazard rate in this group over long periods. Figure 3d includes 74 events, although about 15 are visually recognized. The hazard rate decreases as shown in Fig. 3b after each event. When several tremors occur in a short time, all of them occur with high hazard rate as shown in Fig. 3e.

### 3.3 Validation using the transformed time

This probabilistic model can standardize spatially diverse tremor activity. The validity of the standardization is confirmed by transforming the actual time series of events into a transformed time series (Ogata 1988). The transformed inter-event time \(\tau (t)\) is calculated using the event occurrence time interval \(t\) and hazard rate \(h(t)\) as follows:

$${\tau (t) = \int\limits_{0}^{t} {h(t^{\prime} ){\text{d}}t^{\prime} .} }$$

(8)

Let \(T(t_{i} ) = \tau (t_{1} ) + \cdots + \tau (t_{i} )\) be the transformed time of the *i*th tremor event. Proper modeling of the tremor sequence allows the transformed time series \(T(t_{1} ), \ldots ,T(t_{n} )\) to be regarded as a sample from a stationary Poisson process with an occurrence rate of one, such that the number of tremor events and the transform times are approximately linear. More precisely, the validity of the model is determined using the Kolmogorov–Smirnov (KS) test, which states that if the transformed time series satisfies:

$${\mathop {\max }\limits_{i = 1, \ldots ,n} \left| {T(t_{i} ) - i} \right| < 1.36\sqrt n ,}$$

(9)

then the null hypothesis that the transformed time series is sampled from a Poisson process is not rejected at the 5% significance level, and the series is expected to follow a stationary Poisson process.

Figure 4 compares the number of events and the transformed times at two locations. The results for the example in Figs. 2a and 3a, where the modeling was generally successful, are shown in Fig. 4a. A total of 364 (83%) of the 437 groups in this study can be regarded as Poisson processes based on this criterion, with this renewal process modeling approach generally being successful. The number of available groups is reduced to 316 (72%) after applying the above-mentioned criterion for the standard error of \(\ln \mu_{{\text{L}}}\) and \(\ln \mu_{{\text{S}}}\). On the other hand, a large deviation from the straight line is observed at *i* = 300–600 in Fig. 4b, which does not satisfy the condition in Eq. (9). Therefore, this result cannot be regarded as a Poisson process via the KS test. The modeling was likely unsuccessful for these groups because either the tremor behavior changed during the study period, or the groups included unusual periods. The latter is discussed in the following subsection. The distribution of such unsuccessful groups is shown in Additional file 1: Fig. S4. Many of these groups are located near the Tokai and Bungo Channel regions, where long-term SSEs have been reported.

### 3.4 Quantitative detection of an anomalous period

Ogata (1992) applied the ETAS model to both Japanese and global seismicity and identified a seismically quiescent period before major earthquakes using the deviation of the transformed time series from the Poisson process. Similar deviations have been associated with earthquake swarms and SSEs (Llenos et al. 2009; Okutani and Ide 2011), and volumetric changes around volcanoes (Kumazawa et al. 2016). Positive deviations (i.e., more events than expected) are interpreted to indicate the onset of earthquake swarms and SSEs, whereas negative deviations suggest the influence of stress shadows (e.g., Ogata 2005). It is also possible to detect anomalous periods in the tremor sequences using our renewal process model. We objectively determine an anomalous period from the time series shown in Fig. 4b, which we identified as anomalous in the previous subsection.

The group shown in Fig. 4b is located near the southwestern edge of the study area, where a large SSE and considerable associated tremor activity were observed in 2009–2010 (Fig. 1b) (e.g., Yoshioka et al. 2015). We therefore need to fit a different model because the tremor activity during this period is clearly different from that during the other periods. We consider two cases: Case (1), where the tremor behavior changed at time \(t_{{{\text{START}}}}\), and Case (2), where the behavior only changed during the period \(t_{{{\text{START}}}} < t < t_{{{\text{END}}}}\). The latter period in Case (1) and the anomalous period in Case (2) are described by another set of model parameters \((\mu_{{\text{L}}}^{\prime } ,\alpha^{\prime},\mu_{{\text{S}}}^{\prime } ,\sigma^{\prime} ,\phi^{\prime})\), such that 10 parameters are estimated via the maximum likelihood method. While the number of parameters to be estimated via the maximum likelihood method is the same in these two cases, Case (2) has an additional change point, at the end of the anomalous period, compared with Case (1). Determining the maximum likelihood solution by varying \(t_{{{\text{START}}}}\) and \(t_{{{\text{END}}}}\) requires these two timings to be treated as change points and given an appropriate penalty (Ogata 1992).

The likelihood was calculated by varying \(t_{{{\text{START}}}}\) in Case (1), and both \(t_{{{\text{START}}}}\) and \(t_{{{\text{END}}}}\) in Case (2), as shown in Fig. 5a, b, respectively. It is natural for the likelihood to increase with the number of parameters. However, the log-likelihood increased by 55.0 in Case (1) and 87.0 in Case (2) with the addition of only five parameters, which is statistically significant based on Akaike’s information criterion (AIC). Since the penalties for the change points are calculated to be 9.6 for Case (1) and 32.7 for Case (2) (Ogata 1992; Okutani and Ide 2011), the effective improvements are 45.4 and 54.8, respectively. Therefore, the change-point models are more appropriate than the uniform model, and Case (2) is more appropriate than Case (1) by a small margin.

The maximum likelihood was obtained when the 1348 events in this group were divided into subgroups based on the timings of the 526th event (*t*_{START} = 08:23:13 on January 29, 2010; all times are given in Japan Standard Time) and 1037th event (*t*_{END} = 18:56:29 on January 15, 2011). The range over which the log-likelihood is reduced by two roughly corresponds to the 95% confidence interval, which extends from the 424th event (21:39:36 on March 3, 2009) to the 535th event (22:43:59 on January 30, 2010) for *t*_{START}, and from the 957th event (00:47:07 on August 15, 2010) to the 1052th event (15:14:03 on January 16, 2011) for *t*_{END}. The beginning of the anomalous period is likely more uncertain because this SSE started with a slow moment release (Yoshioka et al. 2015). The parameters obtained for the maximum likelihood estimates can be used to construct the distribution of inter-event times for each period, as shown in Fig. 5c. The values of \(\mu_{{\text{L}}}^{\prime }\) and *α*′ could not be determined precisely due to the short period of available data, whereas the small \(\mu_{{\text{S}}}^{\prime }\) value suggests there were many short-term interactions during this period. The transformed time that was calculated using this distribution yields an approximately straight line (Fig. 5d).

This result may seem trivial because the anomaly discussed here was known to be associated with an SSE and was even visually detected in the spatiotemporal tremor distribution. Nevertheless, the objectively determined *t*_{START} and *t*_{END} values provide insights into the initiation and termination of the SSE process that was associated with this tremor activity. Such a parameterization can also be used for real-time monitoring to detect anomalous tremor activity.