Skip to main content

An automatic peak deconvolution code for Raman spectra of carbonaceous material and a revised geothermometer for intermediate- to moderately high-grade metamorphism


Carbonaceous material (CM) undergoes progressive changes that reflect its thermal history. These changes are in general irreversible and provide valuable information for understanding diagenetic and metamorphic processes of crustal rocks. Among various approaches to quantify these changes, the R2 ratio, area ratio of specific peaks in CM Raman spectra, is widely used to estimate the maximum temperature of intermediate- to moderately high-grade metamorphism. The calculation of the R2 ratio requires peak deconvolution of the original spectrum, and the results depend on the details of how this is carried out. However, a clear protocol for selecting appropriate initial conditions has not been established and obtaining a reliable temperature estimate depends at least in part on the experience and skill of the operator. In this study, we developed a Python code that automatically calculates the R2 ratio from CM Raman spectra. Our code produces R2 ratios that are generally in good agreement with those of Aoya et al. (J Metamorph Geol 28:895–914, 2010, for the same Raman data, with much less time and effort than was the case in the previous studies. We have confirmed that the code is also applicable to other previous datasets from both contact and regional metamorphic regions. The overall trend of the recalculated data indicates that samples with R2 greater than ~ 0.7 are not sensitive to the changes in CM maturity and thus should not be used for the calibration of an R2-based geothermometer. We propose a modified geothermometer for contact metamorphism that is strictly applicable to samples with R2 from 0.023 to 0.516, with the proviso that a laser with a wavelength of 532 nm should be used. A slight extrapolation of the newly proposed geothermometer up to R2 of 0.57 provides a temperature estimate that is consistent with the geothermometer of Kaneki and Kouketsu (Island Arc 31:e12467, 2022;; the boundary between the two geothermometers corresponds to a temperature of 391 °C.

1 Introduction

The thermal history of crustal rocks provides important insights into the diagenetic and metamorphic processes of the Earth’s interior. Carbonaceous material (CM) derived from organic matter is a common constituent of sediments and metasedimentary rocks. CM undergoes irreversible changes in its organochemical and crystallographic features—referred to collectively as its maturity—as temperature increases. Quantification of these features of CM has been widely used as a proxy for the thermal history experienced by the host rock. The maturity of CM has been investigated using numerous different analytical techniques such as X-ray diffraction analysis, Raman spectrometry, and elemental analysis (e.g., Buseck and Huang 1985; French 1964; Ganz and Kalkreuth 1987; McKirby and Powell 1974; Seifert 1978; Sweeney and Burnham 1990; Tuinstra and Koenig 1970; van Krevelen 1993). The changes in CM maturity are generally thought to depend on time and temperature alone and be irreversible. However, deformation and in particular brittle deformation may also influence the results (e.g., Nakamura et al. 2015).

Over the past three decades, the use of Raman spectroscopy has dramatically increased in popularity among researchers in the geosciences (see Fig. 1 in Henry et al. 2019 and Kouketsu 2023). A Raman spectrum of CM typically exhibits two bands at wavenumbers of 1355 cm–1 (disordered band) and 1575 cm–1 (graphite band), which are attributed to the disordered and ordered graphitic structures, respectively (Tuinstra and Koenig 1970). Some Raman parameters associated with these two peaks, such as width, position, intensity, and area, have been reported to show close correlation with the maturity of CM (e.g., Beyssac et al. 2002; Kouketsu et al. 2014; Schito et al. 2017; Roberts et al. 1995), as reviewed by Henry et al. (2019). Beyssac et al. (2002) proposed the first empirical geothermometer to estimate the maximum temperature \(T\) during regional metamorphism from the area ratio of specific peaks, i.e., the R2 ratio, as follows:

$$T \left[ {^\circ {\text{C}}} \right] = - 445\left( {{\text{R}}2} \right) + 641.$$

\(\text{R}2\) was defined as

$${\text{R}}2 = \frac{{A_{{{\text{D}}1}} }}{{A_{{{\text{D}}1}} + A_{{{\text{D}}2}} + A_{{\text{G}}} }},$$

where \({A}_{\text{D}1}\), \({A}_{\text{D}2}\), and \({A}_{\text{G}}\) are the areas of the D1-, D2-, and G-bands. Following their pioneering work, various empirical geothermometers based on Raman spectra of CM have been developed for terrestrial rocks (e.g., Aoya et al. 2010; Kouketsu et al. 2014; Lahfid et al. 2010; Lünsdorf et al. 2017; Rahl et al. 2005). Mori et al. (2015a, 2017) performed thermal modeling of contact metamorphism and concluded that the heating duration required for CM maturation to attain steady state for a given temperature is greater than one hundred years. Throughout the present paper, we assume that such steady state conditions have been achieved for all samples. The idea of estimating \(T\) from CM Raman spectra has been extended to fault rocks (e.g., Hirono et al. 2015; Kaneki et al. 2016; Mukoyoshi et al. 2018) and extraterrestrial rocks (e.g., Busemann et al. 2007; Cody et al. 2008; Homma et al. 2015), although the direct application of the geothermometers to these types of rocks may not be appropriate, due to large differences in the environments where CM matures (e.g., Homma et al. 2015; Kouketsu et al. 2017).

Fig. 1
figure 1

Data processing procedures of our code for calculating \(\text{R}2\) from a raw Raman spectrum. One of the spectra of the N29 sample is shown here as an example. a Raw spectrum, linear baseline, and baseline-corrected spectrum; b normalized baseline-corrected spectrum (referred to as ‘measured spectrum’ in this paper); c initial conditions of calculated spectrum; and d calculated spectrum after nonlinear least-squares fitting. Gray areas in a indicate the spectral ranges used to determine a linear baseline. The coefficients of determination \({R}^{2}\) in c and d were calculated using Eq. (8)

Of the CM-based geothermometers that have been proposed (e.g., Aoya et al. 2010; Beyssac et al. 2002; Kouketsu et al. 2014; Lahfid et al. 2010; Lünsdorf et al. 2017; Rahl et al. 2005), those of Kouketsu et al. (2014) and Aoya et al. (2010) (hereafter, referred to as K2014 and A2010, respectively) are among the most widely used. Numerous studies have used one of these geothermometers, or a combination, to estimate \(T\) of the samples of interest (e.g., Bonneville et al. 2020; Cavalazzi et al. 2021; Hickman-Lewis et al. 2020; Mori et al. 2015a, b; Nakamura et al. 2019; Shimura et al. 2021; Yamaoka et al. 2022). The geothermometers of K2014 and A2010 are based on the widths of specific peaks and \(\text{R}2\), respectively, and have distinct but complementary applicable temperature ranges: for K2014 it is 150–400 °C and for A2010 it is 340–655 °C. Although both approaches require peak deconvolution of the measured Raman spectra, the criteria for setting the initial conditions for nonlinear least-squares fitting are not clearly presented in the original papers, meaning temperature estimates could vary depending on the skill and experience of the analysts. Kaneki and Kouketsu (2022) (hereafter, referred to as KK2022) addressed this issue by developing a Python code that automatically performs peak deconvolution to calculate the widths of the D1- and D2-bands. The main objective of the present study is to develop a similar code to calculate \(\text{R}2\).

We first describe the fitting procedures implemented in our code to reproduce \(\text{R}2\) of A2010 for the same dataset. The results obtained are then compared with those reported by A2010 for the validation of the code. Finally, we assess the utility of our proposed approach including discussion of its limitations based on results of applying it to datasets other than those of A2010.

2 Methods

In the development of our code, we reanalyzed the same Raman spectral dataset as that measured and analyzed in A2010. The basic structure of the code is similar to that of Fitting D in KK2022. Note that we developed our code by trial and error to reproduce the analytical results of A2010 as closely as possible for the same dataset. For this purpose, we tried to follow the analytical procedures of A2010 as closely as possible (e.g., fitting function, order of a baseline, number of peaks, etc.).

2.1 Data

We first briefly describe the samples and spectral data of A2010. Readers can refer to A2010 for more detailed information, including their experimental setups. Since the present study focuses on the development, validation, and application of our new code, we will not examine the procedure and methodology used in the original experiments reported in A2010. A total of ten analyzed samples was studied, of which two, K08 and N33, were excluded from the final geothermometer of the present study (Table 1). The detailed reasons for this selection are discussed in Sect. 4.3. All samples experienced contact metamorphism, and their independently estimated \(T\) range from 340 ± 25 °C to 655 ± 25 °C. A2010 measured at least 50 spectra for each sample and a total of 809 spectra. Most of the spectra show two distinct peaks of the disordered and graphite bands (Fig. 1a), except for highly matured CM.

Table 1 Summary of temperature, measurement condition, and R2 ratio of each sample used for the geothermometer.

A2010 examined the effect of measurement conditions on the analytical results by varying the magnification of the objective lens (50 × or 100 ×), sample type (thin section or polished chip), and incident angle of laser beam (normal or parallel to the c-axis of the crystal structure of graphite). They concluded that these variations in the experimental conditions had a limited effect on the calculated \(\text{R}2\) (Fig. 8 in A2010).

2.2 Code description

2.2.1 Fitting function

A2010 employed a Voigtian, \(V\), in their fitting procedures, which is defined as the convolution of a Gaussian, \(G\), and a Lorentzian, \(L\), as follows:

$$V\left( {x;\omega ,\sigma ,\gamma ,A} \right) = \mathop \smallint \limits_{ - \infty }^{\infty } G\left( {x^{\prime};\omega ,\sigma ,A} \right)L\left( {x - x^{\prime};\omega ,\gamma ,A} \right) {\text{d}}x^{\prime},$$

where \(\omega\) is the center position, \(A\) is the area, and \(\sigma\) and \(\gamma\) represent the widths of \(G\) and \(L\), respectively. \(G\) and \(L\) are expressed as

$$G\left( {x;\omega ,\sigma ,A} \right) = \frac{A}{{\sqrt {2\pi \sigma^{2} } }}\exp \left[ { - \frac{{\left( {x - \omega } \right)^{2} }}{{2\sigma^{2} }}} \right],$$
$$L\left( {x;\omega ,\gamma ,A} \right) = \frac{A\gamma }{{\pi \left[ {\left( {x - \omega } \right)^{2} + \gamma^{2} } \right]}}.$$

Although we need to determine the values of the arguments of \(V\) as an initial condition for nonlinear least-squares fitting, it is difficult to constrain \(\sigma\) and \(\gamma\) from a raw Raman spectrum. Some previous studies (e.g., Kouketsu et al. 2014; Lünsdorf and Lünsdorf 2016) employed a pseudo-Voigtian, \(PV\), for peak deconvolution. One can define a typical \(PV\) as follows:

$$PV\left( {x;\omega ,{\Gamma },A,\eta } \right) = \left( {1 - \eta } \right)G\left( {x;\omega ,\frac{{\Gamma }}{{2\sqrt {2\ln \left( 2 \right)} }},A} \right) + \eta L\left( {x;\omega ,\frac{{\Gamma }}{2},A} \right),$$

where \(\eta\) is the mixing ratio of \(G\) and \(L\), and \(\Gamma\) is full width at half maximum (FWHM). \(PV\) in Eq. (6) takes its maximum intensity \(I\) at \(x=\omega\) as follows:

$$I = \frac{2A}{{\Gamma }}\left[ {\left( {1 - \eta } \right)\sqrt {\frac{\ln \left( 2 \right)}{\pi }} + \eta \frac{1}{\pi }} \right].$$

Note that the first approximation to all arguments of \(PV\) can be easily determined from a raw Raman spectrum. Since we are interested in the area ratio and the difference in \(A\) between \(V\) and \(PV\) is less than 1% (Fig. S1), we concluded that the usage of \(PV\) instead of \(V\) should have negligible effect on the calculated \(\text{R}2\).

2.2.2 Background correction and normalization

Background correction is required for each raw spectrum prior to peak deconvolution. Although it is obvious that A2010 employed a linear baseline, the details of the algorithm used to describe it are not clearly presented. In our code, we determined the slope and intercept of the linear baseline by a linear least-squares method using the raw spectral data in the ranges 1100–1150 cm–1 and 1700–1750 cm–1 (Fig. 1a). However, for samples with a high degree of maturity and thus low effects of fluorescence, a small bulge sometimes appears around 1100 cm–1 (attributable to the overlapping of spectra from quartz or calcite grains with the spectra from CM), which makes this procedure inapplicable (Fig. S2a). In this case, it is better to describe the linear baseline using the data in the ranges 1200–1250 cm–1 and 1700–1750 cm–1 (Fig. S2b). We defined a ‘mature’ spectrum as one with the maximum intensity at 1320–1380 cm–1 (disordered band) less than half that at 1570–1600 cm–1 (graphite band) after the baseline correction. If a given spectrum satisfies this criterion, the linear baseline is described using the spectral data in the ranges 1200–1250 cm–1 and 1700–1750 cm–1, rather than 1100–1150 cm–1 and 1700–1750 cm–1.

The baseline-corrected spectrum was then normalized such that its maximum intensity in the range 1100–1750 cm–1 becomes unity (Fig. 1b). This normalized baseline-corrected spectrum is referred to as the ‘measured spectrum’ in the present study, and we used this spectrum for the peak deconvolution.

2.2.3 Initial conditions and peak deconvolution

To perform the nonlinear least-squares fitting of the measured spectrum, the initial conditions for each peak should be determined. The approach to setting the initial conditions in our code is similar to that of Fitting D in KK2022. The D1-, D2-, D3-, and G-bands of \(PV\) are assumed to be always present in all the measured spectra. The maximum intensity of the measured spectrum in the range 1320–1380 cm–1 and the associated wavenumber are employed as the initial values for \(I\) and \(\omega\) of the D1-band, respectively. The same analysis is applied to the G-band in the range 1570–1600 cm–1. The initial \(\Gamma\) of the D1-band is set as the distance between two wavenumbers where the intensities are half the initial \(I\) in the ranges from 1200 cm–1 to initial \(\omega\) and from initial \(\omega\) to 1500 cm–1. That of the G-band is set to twice the distance between initial \(\omega\) and the wavenumber at the point where the intensity is half the initial \(I\) in the range from 1500 cm–1 to initial \(\omega\). These approaches to determining appropriate values for the initial \(\Gamma\) allow us to analyze Raman spectra of CM with different maturities using a single code, eliminating the subjectivity owing to the selection of the fitting method (a detailed discussion of this point can be found in Sect. 4.3 in KK2022). The D1- and G-bands are initially assumed to be a pure Lorentzian, and thus \(\eta =1\). To determine the initial conditions of the other two bands, we used the residual spectrum after subtracting the initial D1- and G-bands from the measured spectrum (Fig. S3). For the D2-band, 90% of the maximum intensity of the residual spectrum in the range 1610–1630 cm–1 and the associated wavenumber are employed as the initial values for \(I\) and \(\omega\), respectively. The same approach is employed for the D3-band in the range 1500–1540 cm–1. The initial \(\Gamma\) of the D2-band is set to twice the distance between the initial \(\omega\) and the wavenumber where the intensity is half the initial \(I\) in the range from initial \(\omega\) to 1700 cm–1. That of the D3-band is set to 130 cm–1, as in Fitting D of KK2022. The D2- and D3-bands are initially assumed to be half Gaussian and half Lorentzian, and thus \(\eta =0.5\). We calculated the initial \(A\) values of these four bands using Eq. (7).

After determining the initial conditions of the calculated spectrum (Fig. 1c), we performed nonlinear least-squares fitting on the measured spectrum in the spectral range 1100–1750 cm–1 (Fig. 1d). We used scipy.optimize.curve_fit function with the Trust Region Reflective algorithm (SciPy v1.10.1), which is applicable even when the bounds on the model parameters are given. We defined the cost function as half the sum of squares of the residual of the measured spectrum after subtracting the calculated spectrum. The maximum number of iterations was set to 10,000. All values of tolerances during the iterations were set to 10–8. \(\Gamma\) and \(A\) of the four bands can take any positive values. The value of \(\eta\) can move from zero to unity. The \(\omega\) values can vary 1300–1400 cm–1, 1600–1650 cm–1, 1450–1550 cm–1, and 1550–1600 cm–1 for the D1-, D2-, D3-, and G-bands, respectively. The coefficient of determination, \({R}^{2}\), was calculated for each fitting curve:

$$R^{2} = 1 - \frac{{\mathop \sum \nolimits_{i} \left( {I_{i}^{{{\text{meas}}}} - I_{i}^{{{\text{calc}}}} } \right)^{2} }}{{\mathop \sum \nolimits_{i} \left( {I_{i}^{{{\text{calc}}}} - \overline{I}^{{{\text{calc}}}} } \right)^{2} }},$$

where \({I}_{i}^{\text{meas}}\) and \({I}_{i}^{\text{calc}}\) are the intensities of the measured and calculated spectra, and \({\overline{I} }^{\text{calc}}\) is the mean intensity of the calculated spectrum in the range 1100–1750 cm–1.

For each sample, we calculated the values of the mean, standard deviation (SD), and standard error (SE) of \(\text{R}2\) of the calculated spectra after optimization. As in A2010, we then excluded the data with \(\text{R}2\) deviating by more than ± 2SD from the mean value. After this treatment, we recalculated the mean, SD, and SE values.

3 Results

3.1 Performance of the code

Figure 2 shows examples of the initial conditions together with the measured spectra for each sample. The initial fitting curves show a good match with the measured data with \({R}^{2}>0.97\). After optimization, the \({R}^{2}\) of the calculated spectra exceed 0.99 regardless of the sample (Fig. 3). Since the fitting is performed automatically, the same analytical results are obtained for the same spectral data, irrespective of the analyst. Our code typically analyzes one spectrum in less than one second using an ordinary laptop computer and thus greatly reduces the time and effort required to arrive at a result.

Fig. 2
figure 2

Examples of the measured spectra and initial conditions of the calculated spectra for each sample. The coefficients of determination \({R}^{2}\) were calculated using Eq. (8)

Fig. 3
figure 3

Examples of the measured spectra and the calculated spectra after optimization for each sample. The coefficients of determination \({R}^{2}\) were calculated using Eq. (8)

3.2 Comparison with Aoya et al. (2010)

Figure 4 compares \(\text{R}2\) of the present study with those reported by A2010. Error bars indicate the SD. As described in Sect. 2.1, A2010 investigates the effect of measurement conditions on the calculated \(\text{R}2\) by varying the sample type, magnification of an objective lens, and incident angle of laser beam. Figure 4a shows the comparisons of \(\text{R}2\) for each measurement condition. Our code-generated results are generally in good agreement with those of A2010 within the range of SD, regardless of the measurement conditions. When \(\text{R}2\) exceeds ~ 0.6, the D2- and G-bands were not well decomposed for some samples (e.g., K08 in Figs. 2 and 3), and the results of the present study showed significantly larger \(\text{R}2\) than those of A2010 with differences exceeding one SD. These trends also hold even when the data measured under different conditions are summed and then analyzed to obtain representative \(\text{R}2\) for each sample (Fig. 4b). These results are consistent with the negligible effects on \(\text{R}2\) by varying the measurement conditions as examined by A2010. All data are summarized in Table 1.

Fig. 4
figure 4

Comparison of \(\text{R}2\) of the present study with those of Aoya et al. (2010). The results for a each measurement condition and b each sample are shown. All plotted data are summarized in Table 1. When \(\text{R}2\) of both studies are identical, the data are plotted on the gray line. Error bars indicate the standard deviation

3.3 Modified geothermometer

Instead of a linear model proposed by Beyssac et al. (2002) (Eq. (1)), A2010 presents a quadratic equation to express the relationship between \(\text{R}2\) and \(T\):

$$T \left[ {^\circ {\text{C}}} \right] = 221.0\left( {{\text{R}}2} \right)^{2} - 637.1\left( {{\text{R}}2} \right) + 672.3.$$

Ideally, \(\text{R}2\) calculated using our code should be suitable to use Eq. (9) for estimating \(T\). However, there are differences in methodology of the present study that make this simple approach inappropriate. (1) The present study uses a more limited sample set than A2010 with the data not only for N33 but also for K08 removed when defining the calibration curve. (2) Our code-generated \(\text{R}2\) were not identical to those of A2010 even for the same spectra (Fig. 4 and Table 1). (3) The uncertainties in both \(\text{R}2\) and \(T\) should be taken into consideration when formulating the calibration curve. Due to these differences, we concluded that the previous geothermometer (Eq. 9) needs to be modified when employing our code, which is the main issue addressed in this subsection.

Since the calibration data have uncertainties in both \(\text{R}2\) and \(T\) (Table 1), the coefficients of the quadratic function cannot be calculated as a unique solution to a linear equation. Therefore, we performed the Deming regression (Text S1), which is applicable to a regression problem that requires iterative treatments to optimize the model parameters. Samples K08 and N33 were excluded from our regression analysis, owing to their inappropriateness as the calibration samples (see Sect. 4.3 for the detailed discussion). Following KK2022, we assume that the errors in \(T\) of the calibration samples can be represented by the SE. We employed SE rather than SD as the uncertainties of each data point, since SE is required to perform the Deming regression (Text S1). Finally, the regression curve obtained is expressed as follows:

$$T \left[ {^\circ {\text{C}}} \right] = 393.6\left( {{\text{R}}2} \right)^{2} - 716.9\left( {{\text{R}}2} \right) + 671.8.$$

The geothermometer of Eq. (10) is strictly only applicable to samples with \(\text{R}2\) from 0.023 to 0.516 (Table 1). Figure 5 shows the regression curve (Eq. (10)), its 95% prediction interval (Text S2), and the data points used to calculate them (Table 1). Although A2010 defined the uncertainty of Eq. (9) as the maximum difference between the estimated and known \(T\) of the calibration samples, we employed the prediction interval for the uncertainty of Eq. (10) owing to its clear statistical definition. Therefore, a direct comparison of the uncertainties between Eqs. (9) and (10) is inappropriate. The regression curve of Eq. (10) shows a very good fit to the data with \({R}^{2}=0.998\), which is similar to that of A2010 (\({R}^{2}=0.995\)). The optimization of the quadratic function by the iterative treatment affected the results of temperature estimation only modestly, with ~ 1 °C difference in \(T\) between the regression curves after 0th and 10th iterations (Fig. S4). At a given \(\text{R}2\), the calculated \(T\) using Eq. (10) differs by a maximum of ~ 10 °C from the results of using Eq. (9) (Fig. 6), which is similar to the 95% prediction interval of Eq. (10) (~ 15 °C). Note that Eq. (10) was established using the samples which underwent contact metamorphism, and its direct application to the samples which experienced other processes including regional metamorphism may not be appropriate, although our code may be useful to establish a new R2-based geothermometer for these situations.

Fig. 5
figure 5

Geothermometer modified from that of Aoya et al. (2010). R2 calculated using our code are plotted against \(T\) experienced by each of the samples examined by Aoya et al. (2010), together with the obtained geothermometer (Eq. 10) and its 95% prediction interval. All plotted data are summarized in Table 1. The black points indicate the results for the samples K08 and N33, which were excluded from the regression analysis (see Sect. 4.3 for the detailed reasons). Error bars for \(\text{R}2\) are the standard error, which are almost hidden by the data points owing to their small values. Error bars in \(T\) were also taken as the standard errors. The coefficient of determination \({R}^{2}\) was calculated using Eq. (8)

Fig. 6
figure 6

Comparison of the present geothermometer with that of Aoya et al. (2010). a Geothermometers proposed by Aoya et al. (2010) (Eq. 9) and the present study (Eq. 10), and b their difference

4 Discussion

4.1 Deviation from Aoya et al. (2010)

\(\text{R}2\) of the K08 sample in the present study was larger than that of A2010 by an amount that exceeds one SD (Fig. 4 and Table 1). Let us assume that a measured spectrum is perfectly explained by a calculated spectrum, i.e., \({R}^{2}\) in Eq. (8) is strictly unity. It then follows from Eq. (2) that

$${\text{R}}2 = \frac{{A_{{{\text{D}}1}} }}{{A_{{{\text{all}}}} - A_{{{\text{D}}3}} }},$$

where \({A}_{\text{all}}\) represents the entire area of the measured spectrum in the spectral range 1100–1750 cm–1 and thus takes a constant value. The Raman spectra of the K08 sample typically show a clear D4-band at around 1200 cm–1 and a strong ‘saddle’ intensity at around 1500 cm–1 (e.g., Figs. 2 and 3). This indicates that the optimized \({A}_{\text{D}1}\) and \({A}_{\text{D}3}\) values may be strongly influenced by which algorithm was employed to perform the peak deconvolution. Therefore, the observed difference in \(\text{R}2\) between these two studies may be qualitatively explained if the optimization algorithm of the present study yields larger \({A}_{\text{D}1}\) and \({A}_{\text{D}3}\) than A2010.

4.2 Applicability of the code to other datasets

Since our code was developed by trial and error to reproduce \(\text{R}2\) of A2010 as closely as possible, its applicability to datasets other than those of A2010 should be examined. We reanalyzed the raw Raman spectral data of the natural samples measured and analyzed by Kouketsu et al. (2019, 2021), Shimura et al. (2021), and Yamaoka et al. (2022). These studies measured the Raman spectra of natural CM under the same experimental conditions as A2010, performed peak deconvolution by manually setting the initial conditions, and calculated \(\text{R}2\). \(\text{R}2\) of the samples to be reanalyzed were reported to range from 0.066 ± 0.041 to 0.655 ± 0.021 (Table S1).

The comparison of \(\text{R}2\) obtained in the previous and present studies is shown in Fig. 7. Error bars represent the SD. All the data obtained are summarized in Table S1. \(\text{R}2\) of the present study generally agree with those of the previous studies within the margin of SD. However, we note that when \(\text{R}2\) is greater than ~ 0.6, our code yielded significantly larger \(\text{R}2\) than the previous studies exceeding one SD, something which was also recognized for the samples used in A2010 (Fig. 4). These results clearly indicate that our code is applicable to datasets other than those of A2010, although the code-generated results significantly differ from the previous analysis when \(\text{R}2>\,\sim 0.6\).

Fig. 7
figure 7

Applicability of our code to datasets other than those of Aoya et al. (2010). \(\text{R}2\) calculated using our code are plotted against those reported in the previous studies. All plotted data are summarized in Table S1. The gray line indicates exact agreement between the values calculated in the previous and present studies. Error bars indicate the standard deviation

4.3 Sensitivity of R2 as a proxy for CM maturity and its implications for use as a geothermometer

K2014 reported that FWHM of the D1-band, \({\Gamma }_{\text{D}1}\), linearly decreases with increasing \(T\) while it shows almost no change after reaching ~ 40 cm–1. Therefore, we can consider \({\Gamma }_{\text{D}1}>40 \, {\text{cm}}^{-1}\) as a necessary condition that \({\Gamma }_{\text{D}1}\) has its sensitivity to changes in CM maturity. This subsection focuses on the similar sensitivity test for \(\text{R}2\) and discusses its implications for practical temperature estimation.

We reexamined the raw spectral data of Aoya et al. (2010), Kouketsu et al. (2019, 2021), Nakamura et al. (2019), Shimura et al. (2021), and Yamaoka et al. (2022). \(\text{R}2\) was calculated using our code, while \({\Gamma }_{\text{D}1}\) was determined using the code for Fitting E of KK2022. Figure 8 compares \(\text{R}2\) and \({\Gamma }_{\text{D}1}\) for each sample. Figure 8b shows an enlarged view of the gray area in Fig. 8a. Error bars indicate the SD. The large SD of \({\Gamma }_{\text{D}1}\) of highly matured CM with \(\text{R}2\) of ~ 0.1 were attributed to the disappearance of the D1-band (e.g., K04 in Figs. 2 and 3). All the data obtained are summarized in Table S2. The results obtained show the same trend regardless of datasets (Fig. 8a). Within the region for low-grade material (upper right part in Fig. 8a), \({\Gamma }_{\text{D}1}\) decreases significantly with increasing CM maturity while \(\text{R}2\) shows almost constant values of 0.6–0.8. \(\text{R}2\) starts to decrease from ~ 0.7 toward zero when \({\Gamma }_{\text{D}1}\) decreases to ~ 55 cm–1 (Fig. 8b). After \({\Gamma }_{\text{D}1}\) reaches ~ 40 cm–1 at \(\text{R}2\) of ~ 0.55, only \(\text{R}2\) shows a systematic decrease with CM maturity. Based on these results, we suggest that \(\text{R}2<\,\sim 0.7\) is a necessary condition for \(\text{R}2\) to be sensitive to changes in CM maturity. In other words, samples with \(\text{R}2\ge \,\sim 0.7\) should not be employed in the calibration of an R2-based geothermometer. This is the reason why our regression curve (Eq. 10) did not include the K08 sample (\(\text{R}2=0.737\pm 0.029\)), which enables our geothermometer to yield better results around \(\text{R}2=0.516\) than that of A2010 (Eq. 9) who included the K08 sample (Fig. 6). Since very low-grade CM sometimes exhibits \(\text{R}2\) of ~ 0.7 (e.g., dataset of Nakamura et al. 2019 in Fig. 8a), the condition \(\text{R}2<\,\sim 0.7\) alone can be misleading for the assessment of CM maturity. It is important to check the consistency of the results obtained with other observations, including the strong influence of fluorescence (e.g., Schito et al. 2017) and the existence of the D4-band (e.g., Kouketsu et al. 2014), both of which are absent in the spectra of intermediate-grade CM (Fig. S5).

Fig. 8
figure 8

Comparison of \(\text{R}2\) and \({\Gamma }_{\text{D}1}\) for various natural samples. The results for the samples of Aoya et al. (2010), Kouketsu et al. (2019, 2021), Nakamura et al. (2019), Shimura et al. (2021), and Yamaoka et al. (2022) are shown. All plotted data are summarized in Table S2. \(\text{R}2\) were calculated using our code, while \({\Gamma }_{\text{D}1}\) were determined using the code for Fitting E of Kaneki and Kouketsu (2022). A close-up of the gray area in a is shown in b. The black-dashed line in b indicates the proposed upper limit \(\text{R}2=0.57\) for the extrapolation of our geothermometer (Eq. 10). When the calculated \(\left(\text{R}2,{\Gamma }_{\text{D}1}\right)\) satisfy Eq. (13) (indicated by the gray line in b), the geothermometers of Kaneki and Kouketsu (2022) (Eq. (12)) and the present study (Eq. (10)) yield the same \(T\). Star symbol in b represents the point \(\left(\text{R}2,{\Gamma }_{\text{D}1}\right)=\left(\text{0.57},\text{41.3} \,{\text{cm}}^{-1}\right)\) where the two geothermometers yield T = 391 °C. Error bars indicate the standard deviation

As described in Sect. 3.3, our modified geothermometer is determined in the range \(0.023\le \text{R}2\le 0.516\) (Fig. 5 and Table 1). However, the geothermometers based on \({\Gamma }_{\text{D}1}\), including that of KK2022, are applicable only when \(\text{R}2\) exceeds ~ 0.55, owing to the insensitivity of \({\Gamma }_{\text{D}1}\) to changes in CM maturity for lower \(\text{R}2\) (Fig. 8). Therefore, in a strict sense, a combination of the code-based geothermometers of KK2022 and the present study is incapable of estimating \(T\) of samples with \(\text{R}2\) from 0.516 to ~ 0.55 and \({\Gamma }_{\text{D}1}\) of ~ 40 cm–1. A straightforward solution to this problem would be to refine our modified geothermometer further by considering additional calibration data of samples whose \(T\) are known, \({\Gamma }_{\text{D}1}\) are ~ 40 cm–1, and \(\text{R}2\) are larger than ~ 0.55 while smaller than ~ 0.7. Although the N33 sample almost satisfies this requirement (Table 1), the bimodal distribution in its \(\text{R}2\) histogram (Fig. S6) indicates its inappropriateness as a calibration sample. An alternative approach is to simply extrapolate the applicable range of Eq. (10), as implicitly done in some previous studies (e.g., Kaneki and Kouketsu 2022; Kouketsu et al. 2014; Rahl et al. 2005). In this case, we need to consider an appropriate extent for the extrapolation of our geothermometer. One potential clue can be obtained when we try to smoothly connect the geothermometers of KK2022 and the present study on the data trend shown in Fig. 8. The \({\Gamma }_{\text{D}1}\)-based geothermometer of KK2022 is

$$T{ }\left[ {^\circ {\text{C}}} \right] = - 2.30\left( {{\Gamma }_{{{\text{D}}1}} \left[ {{\text{cm}}^{ - 1} } \right]} \right) + 486.$$

By combining Eqs. (10) and (12), we obtain

$${\Gamma }_{{{\text{D}}1}} \left[ {{\text{cm}}^{ - 1} } \right] = - 171.1\left( {{\text{R}}2} \right)^{2} + 311.7\left( {{\text{R}}2} \right) - 80.8.$$

When the calculated \(\left(\text{R}2,{\Gamma }_{\text{D}1}\right)\) satisfy Eq. (13), the two geothermometers yield the same \(T\). For example, if we assume the connecting \(T\) is 391 °C, the corresponding \(\text{R}2\) and \({\Gamma }_{\text{D}1}\) are 0.57 and 41.3 cm–1, respectively (Eqs. (10) and (12)). It is noteworthy that the point \(\left(\text{R}2,{\Gamma }_{\text{D}1}\right)=\left(\text{0.57},\text{41.3} \,{\text{cm}}^{-1}\right)\) (indicated by star symbol in Fig. 8b) plots well within the data trend, which is drawn by a number of recalculated data both from contact and regional metamorphisms. Therefore, the slight extrapolation of our geothermometer up to \(\text{R}2=0.57\) (within the standard deviation of the lowest-\(T\) sample for our calibration curve, N30) provides a consistent switching to the lower-\(T\) geothermometer of KK2022, with a boundary corresponding to \(T=391 \, ^\circ \text{C}\). In addition, setting \(\text{R}2=0.57\) as a boundary between the two geothermometers enables an unambiguous classification of all the data in Fig. 8 into one of two groups: using our geothermometer (Eq. (10)) if \(\text{R}2\le 0.57\) and using KK2022 (Eq. (12)) if \(\text{R}2>0.57\).

4.4 Effect of laser wavelength

Differences in the laser wavelength employed in the Raman system can significantly affect the spectral intensity and subsequent analytical results (e.g., Wang et al. 1990; Ferrari and Robertson 2001). K2014 and KK2022 demonstrated that for the Raman data of a specific sample measured under different experimental conditions (i.e., spectrometer, laser wavelength, pinhole size, and grating), some representative Raman parameters (i.e., \(\Gamma\), \(\omega\), intensity ratio, and area ratio) showed consistent values when the laser wavelength was the same, while the parameters other than \(\Gamma\) differed significantly for different laser wavelengths. These results clearly suggest that the effect of the laser wavelength on the spectral parameters dominates over the other variable experimental conditions. Since these two studies analyzed the data of the sample with \(T=165 \, ^\circ \text{C}\) and \(330 \, ^\circ \text{C}\), which is outside the temperature range of the calibration samples used in our regression analysis, this subsection investigates the dependence of the analytical results using our code on the laser wavelength for samples with higher \(T\).

We measured the Raman spectra of the N30 sample (\(T=410 \, \pm \,30 \, ^\circ \text{C}\)) under the same experimental conditions as for A2010, except that we used a laser with a wavelength of 633 nm instead of the original 532 nm. Using our code, we calculated the following six Raman parameters: intensity ratio of the D1-band to the G-band (i.e., R1 ratio), \(\text{R}2\), \(\Gamma\) of the D1- and G-bands, and \(\omega\) of the D1- and G-bands. The results obtained are compared with those of a 532 nm laser in Fig. 9. Error bars indicate the SD. All the data obtained are summarized in Table S3. Of the six parameters examined, only \(\Gamma\) of the D1- and G-bands were consistent between two laser wavelengths within the margin of SD (Fig. 9b), similar to the results reported in K2014 and KK2022. The other four parameters, including \(\text{R}2\), showed significant differences exceeding one SD when the laser wavelength was changed from 532 to 633 nm (Fig. 9a, c). The observed larger \(\text{R}2\) for the longer laser wavelength (Fig. 9a) may be attributed to (i) use of the longer laser wavelength resulting in the larger R1 ratio (e.g., Ferrari and Robertson 2001), and (ii) the positive correlation between the R1 and R2 ratios (e.g., Aoya et al. 2010). The mean \(T\) calculated using Eq. (10) were 407 and 364 °C for a 532 and 633 nm laser, respectively, and their difference exceeded 95% prediction intervals. Considering these facts, geothermometers based on \(\text{R}2\) cannot be reliably applied to spectra that were measured using a different laser wavelength from the one used to establish it. Therefore, the usage of our geothermometer (Eq. (10)) should be limited to data measured using a 532 nm laser.

Fig. 9
figure 9

Effect of laser wavelength on some Raman parameters. Comparisons of some Raman parameters of the N30 sample between the laser wavelengths of 532 and 633 nm are shown. a R1 and R2 ratios, b FWHM of the D1- and G-bands, and c center positions of the D1- and G-bands are shown. Error bars indicate the standard deviation. Note that the error bars in c are hidden by the data points owing to their small values

5 Conclusions

In this study, we have developed a Python code that automatically performs peak deconvolution of the Raman spectra of carbonaceous material and calculates the R2 ratio. A close comparison with the results of Aoya et al. (2010) shows very good agreement for the same original data unless \(\text{R}2\) exceeds ~ 0.6. Based on a Deming regression analysis, we propose a modified set of coefficients for the quadratic calibration curve first proposed by Aoya et al. (2010) to model the relationship between \(\text{R}2\) and the maximum metamorphic temperature \(T\). This modified curve is used in our code to calculate the mean \(T\) and its 95% prediction interval, although its direct application to samples experiencing the processes other than the contact metamorphism may not be appropriate. Our code is applicable to datasets other than those of Aoya et al. (2010), as long as the Raman spectral data are obtained using a 532 nm laser. We document that \(\text{R}2<\,\sim 0.7\) is a necessary condition for \(\text{R}2\) to be sensitive to changes in the sample maturity. For this reason, our modified geothermometer was calibrated without the K08 sample (mean \(\text{R}2\) of 0.737) and should be preferred over the geothermometer of Aoya et al. (2010). Since a combination of the geothermometers of Kaneki and Kouketsu (2022) and the present study is incapable of estimating \(T\) of samples with maturity in a particular range, further refinement or a slight extrapolation of the present geothermometer is required. In the latter case, we propose \(\text{R}2=0.57\) as a possible boundary where users should switch from the geothermometer presented here to the one proposed by Kaneki and Kouketsu (2022).

Availability of data and materials

Readers can download the code developed in the present study from In the development, validation, and application of the code, we reanalyzed the raw Raman spectral data originally measured and analyzed by Aoya et al. (2010), Kouketsu et al. (2019, 2021), Nakamura et al. (2019), Shimura et al. (2021), and Yamaoka et al. (2022). The raw Raman spectral data newly measured in the present study are used only to investigate the dependence of the analytical results on laser wavelength. Requests for access to the new data should be sent to YK. All analytical results obtained are listed in tables either in the main text or in Supporting Information.



Aoya et al. (2010)


Carbonaceous material


Full width at half maximum


Kaneki and Kouketsu (2022)


Kouketsu et al. (2014)


Standard deviation


Standard error


Download references


We thank Seira Katagiri, Kazuki Matsuyama, and Shunsuke Ogino of Nagoya University for their kind help with assessing the utility of the code. SK also thanks Yukitoshi Fukahata of Kyoto University for helpful comments on the Akaike Information Criterion and Hiroyuki Noda of Kyoto University for fruitful discussion on the preliminarily results. The web page created by Toshihide Ihara and Shin Shigemitsu and maintained by Organic Primary Standards Group, AIST, greatly helped our regression analysis ( We thank the associate editor Madhusoodhan Satish-Kumar for careful handling of the manuscript. Comments from two anonymous reviewers helped us improve the paper.


The present study was supported by JSPS KAKENHI Grant nos. JP20J01284 and JP24K17151 to SK, JSPS KAKENHI Grant no. JP21H05202 to SW, and JSPS KAKENHI Grant nos. JP20J12701 and JP22H05314 to YS.

Author information

Authors and Affiliations



SK, YK, and MA proposed the initial concept for the project based on work initiated by SW. YK, MA, YN, YS, and KY provided the raw Raman spectral data published in the previous papers. YK performed additional measurements using a 633 nm laser. SK developed, validated, and applied the code. SK wrote an early version of the first draft, which was subsequently refined by SW. All authors discussed the results, contributed to the writing of the manuscript, and approved its final version.

Corresponding author

Correspondence to Shunya Kaneki.

Ethics declarations

Competing interests

The authors declare that they have no competing interests that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kaneki, S., Kouketsu, Y., Aoya, M. et al. An automatic peak deconvolution code for Raman spectra of carbonaceous material and a revised geothermometer for intermediate- to moderately high-grade metamorphism. Prog Earth Planet Sci 11, 35 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: