1. Abstract

A key step in the production of chocolate is the fermentation of cocoa beans. This importance relies on its role in the development of chocolate’s flavor and aroma. Unlike other food fermentation processes, this specific fermentation is well known because of its lack of control and multiple ways in which it is performed. Here, a quantitative model of cocoa bean fermentation is constructed on previously available data regarding microbiological and metabolites dynamics. The model is formulated as a system of coupled ordinary differential equations (ODEs) with two different types of state variables: (1) Metabolite concentrations of glucose (Glc), fructose (Fru), ethanol (EtOH), lactic acid (LA) and acetic acid (Ac), and (2) population sizes of yeast (Y), lactic acid bacteria (LAB) and acetic acid bacteria (AAB). In total, the model comprehends 25 unknown parameters that were estimated using the Markov chain Monte Carlo No-U-Turn sampler in Rstan. Thereafter, we demonstrate that the model can quantitatively describe existing fermentation series and that the estimated parameters can be used to extract and interpret differences in environmental conditions between two independent fermentation trials [1].

2. Biological background

The cocoa bean fermentation process occurs mostly in the pulp of raw cocoa beans right after opening of cocoa pods. It is called a spontaneous fermentation given that there is little or not control on its conduction. Opposed to other fermentation processes, the microorganisms responsible for driving it come into first contact with the raw material (cocoa beans) by a pletora of ways, including their exposure with the farmers’ hands, with tools used for the harvest and with the surface of fermenting devices. With respect to the latter, there are several methods that vary depending on the region where cocoa trees grow (i.e., heaps, wooden boxes and plastic containers) [2,3]. Despite these disparity of methodologies and inoculation means, it has been widely described in successful fermentations, the occurrence of three overlapping phasis. Each of these are governed by a different microbial group at a time in the same sequential order: (1) an early stage dominated by yeasts (Y), (2) a mid phase dominated by lactic acid bacteria (LAB) and (3) a final phase dominated by acetic acid bacteria (AAB).

Figure 1. (a) Metabolite and (b) microbial dynamics during cocoa bean fermentation [1,2].

Figure 1. (a) Metabolite and (b) microbial dynamics during cocoa bean fermentation [1,2].

3. Material and methods

3.1. Experimental data

The experimental data in this study were reported by Papalexandratou et al.[3] and Camu et al. [4] from two spontaneous fermentations trials conducted in boxes in Brazil and one by the heap method in Ghana, respectively. Papalexandratou et al. trials were done by means of the wooden boxes fermentation method under slightly different conditions. In one instance, the trial known as ‘box 1’, was placed under a metal roof that provided protection from weather conditions. In the other, the trial known as ‘box 2’, was placed within a fermentary room.

3.2. Setting the baseline

By the qualitative description of the process, it is expected that the problem will become one of a coupled nature given the dependencies of all its dynamics on microbial growth. The basic interactions of the state variables can be summarized into the conversion of substrates (glucose (Glc) and fructose (Fru)) into ethanol (EtOH) by Y, Glc into lactic acid (LA) by LAB and EtOH into acetic acid (Ac) by AAB. These relationships can be defined by the use of equations that must take into account the substrate and product interactions. In the first case, several approximations have been developed, among them, the classical Monod and Contois models. For the later, we considered either the lack or precense of interactions between fermentation products and microbial growth by including linear and non-linear relationships upon the death rates responsible for the decrease on their population. Among these baseline decisions, four candidate models were assessed to fit the previously described data sets as shown in Table 1.

Table 1. Summary of model`s iterations.
Model Multiple substrate Product toxicity Population size effect for LA consumption
M1 no no no
M2 yes no no
M3 yes yes no
M4 yes yes yes

Where Model 1 considers as substrate a sum of Glc and Fru and the Population size effect for LA consumption refers to the use of the Contois equation for the consumption of LA by AAB.

For a more detailed description of the conceptualization of the model, check the supplementary material accompanying the original manuscript

3.3 Mathematical formulation

Take for example model 4. Its mathematical formulation it is as follows.

Figure 2. Network diagram of the cocoa bean fermentation process. Growth and death rates are expressed as $v_i$ [1].

Figure 2. Network diagram of the cocoa bean fermentation process. Growth and death rates are expressed as \(v_i\) [1].

The growth and death rates (\(v_i\)) of microbial populations are driving the wires of the network. For the definition of the these rates, we used Monod or Contois equations for the former, and second- and third-order reaction terms for the latter.

Thus, growth rates are defined as:

\[\begin{align} v_1 & = \frac{\mu_\text{max}^{\text{Y}_\text{Glc}}\,\left[\text{Glc}\right]}{\left[\text{Glc}\right]+K_{\text{Glc}}^{\text{Y}}}\,\left[\text{Y}\right]\\[4pt] v_2 &= \frac{\mu_\text{max}^{\text{Y}_\text{Fru}}\,\left[\text{Fru}\right]}{\left[\text{Fru}\right]+K_{\text{Fru}}^{\text{Y}}}\,\left[\text{Y}\right]\\[4pt] v_3 &= \frac{\mu_\text{max}^\text{LAB}\,\left[\text{Glc}\right]}{\left[\text{Glc}\right]+K_{\text{Glc}}^{\text{LAB}}}\,\left[\text{LAB}\right]\\[4pt] v_4 &= \frac{\mu_\text{max}^{\text{AAB}_\text{EtOH}}\,\left[\text{EtOH}\right]}{\left[\text{EtOH}\right]+K_{\text{EtOH}}^{\text{AAB}}}\,\left[\text{AAB}\right]\\[4pt] v_5 &= \frac{\mu_\text{max}^{\text{AAB}_\text{LA}}\,\left[\text{LA}\right]}{\left[\text{LA}\right]+K_{\text{LA}}^{\text{AAB}}\,\left[\text{AAB}\right]}\left[\text{AAB}\right] \end{align}\]

And the death rates as:

\[\begin{align} v_6 &= k_{\text{Y}}\,\left[\text{Y}\right]\,\left[\text{EtOH}\right]\\ v_7 &= k_{\text{LAB}}\,\left[\text{LAB}\right]\,\left[\text{LA}\right]\\ v_8 &= k_{\text{AAB}}\,\left[\text{AAB}\right]\,\left[\text{Ac}\right]^2 \end{align}\]

So far, a sub-total of 13 unknown parameters are considered in these equations. Five maximum specific growth rates \(\mu_\text{max}\) of the form \(\mu_{\text{max}}^{i_n}\), where \(i\) can be either Y, LAB and AAB, and \(n\) refers whether \(\mu\) corresponds to the maximum specific growth of Y on either Glc or Fru, or AAB on either EtOH or LA. Five substrate saturation constants for the growth of Y, LAB and AAB of the form \(K_{m}^{j}\), where \(j\) can be either Y or LAB and \(m\) can be either Glc, Fru, EtOH and LA. Three constant mortality rates of the form \(k_i\), where \(i\) can be either Y, LAB or AAB.

By putting all these rates together, the addition of 11 more parameters is necessary for accounting the comsumption of substrates (i.e., Glc and Fru) and production of metabolites (i.e., EtOH, LA and Ac). Hence, by the formulation of the following system of ODEs, 11 yield coefficients of the form \(Y_{\text{i}|{j}}^{k}\) where \(i\) can be either Glc, Fru, EtOH, LA or Ac, \(j\) can be either Y, LAB or AAB and \(k\) can be either Glc, Fru, EtOH or LA.

\[\begin{align} \frac{d\left[\text{Glc}\right]}{dt} &= -Y_{\text{Glc}|\text{Y}}\,v_1-Y_{\text{Glc}|\text{LAB}}\,v_3 \\[4pt] \frac{d\left[\text{Fru}\right]}{dt} &= -Y_{\text{Fru}|\text{Y}}\,v_2 \\[4pt] \frac{d\left[\text{EtOH}\right]}{dt} &= Y^{\text{Glc}}_{\text{EtOH}|\text{Y}}\,v_1 + Y^{\text{Fru}}_{\text{EtOH}|\text{Y}}\,v_2 - Y_{\text{EtOH}|\text{AAB}}\,v_4 \\[4pt] \frac{d\left[\text{LA}\right]}{dt} &= Y_{\text{LA}|\text{LAB}}\,v_3 - Y_{\text{LA}|\text{AAB}}\,v_5 \\[4pt] \frac{d\left[\text{Ac}\right]}{dt} &= Y_{\text{Ac}|\text{LAB}}\,v_3 + Y^{\text{EtOH}}_{\text{Ac}|\text{AAB}}\,v_4 + Y^{\text{LA}}_{\text{Ac}|\text{AAB}}\,v_5 \label{2.5} \\[4pt] \frac{d\left[\text{Y}\right]}{dt} &= v_1+v_2-v_6 \\[2pt] \frac{d\left[\text{LAB}\right]}{dt} &= v_3-v_7 \\[2pt] \frac{d\left[\text{AAB}\right]}{dt} &= v_4+v_5-v_8 \end{align}\]

A complete interpretation of these parameters is the following:

Table 2. Interpretation of the parameters for model 4 [1].
Parameter Units Interpretation
\(\mu_\text{max}^{\text{Y}_\text{Glc}}\) \(\text{mg Y }\text{h}^{-1}\) Maximum specific growth rate of Y on Glc
\(\mu_\text{max}^{\text{Y}_\text{Fru}}\) \(\text{mg Y }\text{h}^{-1}\) Maximum specific growth rate of Y on Fru
\(\mu_\text{max}^\text{LAB}\) \(\text{mg LAB }\text{h}^{-1}\) Maximum specific growth rate of LAB on Glc
\(\mu_\text{max}^{\text{AAB}_\text{EtOH}}\) \(\text{mg AAB }\text{h}^{-1}\) Maximum specific growth rate of AAB on EtOH
\(\mu_\text{max}^{\text{AAB}_\text{LA}}\) \(\text{mg AAB }\text{h}^{-1}\) Maximum specific growth rate of AAB on LA
\(K_{\text{Glc}}^{\text{Y}}\) \(\text{mg Glc g pulp}^{-1}\) Substrate saturation constant of Y growth on Glc
\(K_{\text{Fru}}^{\text{Y}}\) \(\text{mg Fru g pulp}^{-1}\) Substrate saturation constant of Y growth on Fru
\(K_{\text{Glc}}^{\text{LAB}}\) \(\text{mg Glc g pulp}^{-1}\) Substrate saturation constant of LAB growth on Glc
\(K_{\text{EtOH}}^{\text{AAB}}\) \(\text{mg EtOH g pulp}^{-1}\) Substrate saturation constant of AAB growth on EtOH
\(K_{\text{LA}}^{\text{AAB}}\) \(\text{mg LA g pulp}^{-1}\) Substrate saturation constant of AAB growth on LA
\(k_{\text{Y}}\) \(\text{mg Y mg EtOH}^{-1}\text{. h}^{-1}\) Mortality rate constant of Y
\(k_{\text{LAB}}\) \(\text{mg LAB mg LA}^{-1}\text{. h}^{-1}\) Mortality rate constant of LAB
\(k_{\text{AAB}}\) \(\text{mg AAB mg Ac}^{-2}\text{. h}^{-1}\) Mortality rate constant of AAB
\(Y_{\text{Glc}|\text{Y}}\) \(\text{mg Glc mg Y}^{-1}\) Y-to-Glc yield coefficient
\(Y_{\text{Glc}|\text{LAB}}\) \(\text{mg Glc mg LAB}^{-1}\) LAB-to-Glc yield coefficient
\(Y_{\text{Fru}|\text{Y}}\) \(\text{mg Fru mg Y}^{-1}\) Y-to-Fru yield coefficient
\(Y^{\text{Glc}}_{\text{EtOH}|\text{Y}}\) \(\text{mg EtOH mg Y}^{-1}\) Y-to-EtOH from Glc yield coefficient
\(Y^{\text{Fru}}_{\text{EtOH}|\text{Y}}\) \(\text{mg EtOH mg Y}^{-1}\) Y-to-EtOH from Fru yield coefficient
\(Y_{\text{EtOH}|\text{AAB}}\) \(\text{mg EtOH mg AAB}^{-1}\) AAB-to-EtOH yield coefficient
\(Y_{\text{LA}|\text{LAB}}\) \(\text{mg LA mg LAB}^{-1}\) LAB-to-LA yield coefficient
\(Y_{\text{LA}|\text{AAB}}\) \(\text{mg LA mg AAB}^{-1}\) AAB-to-LA yield coefficient
\(Y_{\text{Ac}|\text{LAB}}\) \(\text{mg Ac mg LAB}^{-1}\) LAB-to-Ac yield coefficient
\(Y^{\text{EtOH}}_{\text{Ac}|\text{AAB}}\) \(\text{mg Ac mg AAB}^{-1}\) AAB-to-Ac from EtOH yield coefficient
\(Y^{\text{LA}}_{\text{Ac}|\text{AAB}}\) \(\text{mg Ac mg AAB}^{-1}\) AAB-to-Ac from LA yield coefficient

3.4. Bayesian parameter estimation

The aforementioned parameters can be considered elements of a vector \(\theta\) of the form \([\theta_1, \theta_2,\dots,\theta_k]\). By assuming that \(\theta\) has originated the data \(\mathcal{D}\), the parameter estimation problem can be solved by inferring the posterior probability of \(\theta\) given \(\mathcal{D}\), \(P(\theta|\mathcal{D})\). This \(P(\theta|\mathcal{D})\) can be expressed as the product over all series and each data point within time series as:

\[\begin{equation} P\left(\theta\mid \mathcal{D}\right) \propto \prod_{i=1}^{T} \prod_{j=1}^{N} P\left(\mathcal{D}_{i,j}\mid\theta \right)\,P\left(\theta \right), \end{equation}\]

where \(\mathcal{D}_{i,j}\) is the data point from time series \(i\) measured at time \(j\).

By assuming that each observation \(\mathcal{D}_{i,j}\) is drawn from a sampling distribution of the form:

\[\begin{equation} \mathcal{D}_{i,j} \sim \mathcal{N}\left(f\left(x_{i,j},\theta\right),\sigma\right), \end{equation}\]

Our \(P(\theta|\mathcal{D})\) takes the following form:

\[\begin{equation} P\left(\theta\mid \mathcal{D}\right) \propto \prod_{i=1}^{T} \prod_{j=1}^{N} \mathcal{N}\left(f\left(x_{i,j},\theta\right),\sigma\right)\,P\left(\theta \right), \end{equation}\]

where an extra parameter \(\sigma\) is needed for accounting the total standard deviation of the model.

3.4.1. Dimensionality problem

Two problems arise when handling these data. On the one hand, the units in which microbial growth is measured constitutes the counting of Colony Forming Units (CFU). CFUs cannot be easily tossed in model with concentration measurements of metabolites (usually in milligrams per gram of pulp). For tackling this inconvenient, we make use of conversion factors from CFU to milligrams. These conversion factors were 15 picograms per CFU (\(pg\,CFU^{-1}\)), 1.25 \(pg\,CFU^{-1}\) and 0.28 \(pg\,CFU^{-1}\) for Y, LAB and AAB respectively (for a complete explanation of these conversion factors, see [1]).

On the other hand, after the conversion of microbial counts into mass units, the time series corresponding to Y, LAB and AAB showed accute differences in orders of magnitude with respect to the units of the metabolites. Such differences, as it is well known, are likely to lead to computational problems in any optimization scheme. With the aim of overcoming such issues, a variable scaling was performed. This scaling consisted on dividing each time series by its corresponding maximum. In other words, all time series are constrained to account with maximum values equal to unity.

From this point, the parameter estimation scheme is done over the scaled time series and once the posterior distribution of the parameters were obtained, these were subject to a re-scaling process to its real units through conversion factors derived mathematically from the ODE system (to see in further detail the re-scaling process, see [1]).

3.4.2. Priors

By the scaling of the time series, we assumed that the differences of the parameters on their real scales that can be away in orders of magnitude, are then regularized. This means that priors for each single parameter can lie between 0 and 1. Thus, an independent normal distrubution with mean 0.5 and standard deviation equal to 0.3 can be impossed to each \(k\) element of \(\theta\). In the case of the standard deviation of the model \(\sigma\), a heavy tailed distribution such a Cauchy with scale parameters of 0 and 1 will allow the sampling of extreme values that will cover possible outlying observations of the original data.

\[\begin{equation} \begin{aligned} \theta_k &\sim \mathcal{N}(0.5,\,0.3), && \theta_k>0 \\ \sigma &\sim \mathcal{C}(0,\,1), && \sigma>0. \end{aligned} \end{equation}\]

3.5. Stan code

The aforementioned system of ODEs translated into Stan language is solved as an initial value problem, where the initial concentration of the 8 state variables are provided as they were reported in the original work of Papalexandratou et al. for boxes 1 and 2. In both cases, the ODEs were specified and solved by the mechanism rk45. The datasets were fitted by running 4 parallel Markov Chains of 3000 iterations each, 1000 of which were used as warm-up.

functions {
    real[]  cbf(real t,
                real[] x,
                real[] theta,
                real[] y_r,
                int[] y_i) {
        real dxdt[8];
        
        dxdt[1] = - theta[1]*theta[12]*x[1]*x[6]/(theta[17]+x[1]) - theta[2]*theta[14]*x[1]*x[7]/(theta[19]+x[1]);
        dxdt[2] = - theta[3]*theta[13]*x[2]*x[6]/(theta[18]+x[2]);
        dxdt[3] = theta[4]*theta[12]*x[1]*x[6]/(theta[17]+x[1]) + theta[5]*theta[13]*x[2]*x[6]/(theta[18]+x[2]) - theta[6]*theta[15]*x[3]*x[8]/(theta[20]+x[3]);
        dxdt[4] = theta[7]*theta[14]*x[1]*x[7]/(theta[19]+x[1]) - theta[8]*theta[16]*x[4]*x[8]/(theta[21]*x[8]+x[4]);
        dxdt[5] = theta[9]*theta[14]*x[1]*x[7]/(theta[19]+x[1]) + theta[10]*theta[15]*x[3]*x[8]/(theta[20]+x[3]) +theta[11]*theta[16]*x[4]*x[8]/(theta[21]*x[8]+x[4]);
        dxdt[6] = theta[12]*x[1]*x[6]/(theta[17]+x[1]) + theta[13]*x[2]*x[6]/(theta[18]+x[2]) - theta[22]*x[6]*x[3];
        dxdt[7] = theta[14]*x[1]*x[7]/(theta[19]+x[1]) - theta[23]*x[7]*x[4];
        dxdt[8] = theta[15]*x[3]*x[8]/(theta[20]+x[3]) + theta[16]*x[4]*x[8]/(theta[21]*x[8]+x[4]) - theta[24]*x[8]*x[5]^2;
        
        return dxdt;
    }
}

data {
    int<lower=1> T;
    real<lower=0> x[T,8];
    real t0;
    real ts[T];
    real x0[8];
    real scl[8];

}

transformed data {

    real y_r[0];
    int y_i[0];
    real<lower=0> x0_1[8];
    real<lower=0> xn[T,8];
    for (t in 1:T)
        for (n in 1:8)
            xn[t,n]=x[t,n]/scl[n];
  
  for (n in 1:8)
    x0_1[n] = x0[n]/scl[n]; 
}
parameters {
    real <lower=0> sigma;
    real <lower=0> mu1;
    real <lower=0> mu2;
    real <lower=0> mu3;
    real <lower=0> mu4;
    real <lower=0> mu5;
    real <lower=0> ks1;
    real <lower=0> ks2;
    real <lower=0> ks3;
    real <lower=0> ks4;
    real <lower=0> ks5;
    real <lower=0> k1;
    real <lower=0> k2;
    real <lower=0> k3;
    real <lower=0> yc1;
    real <lower=0> yc2;
    real <lower=0> yc3;
    real <lower=0> yc4;
    real <lower=0> yc5;
    real <lower=0> yc6;
    real <lower=0> yc7;
    real <lower=0> yc8; 
    real <lower=0> yc9;
    real <lower=0> yc10;
    real <lower=0> yc11; 
}

transformed parameters {
    real x_hat[T,8];
    {
        real theta[24];
        theta[1] = yc1;
        theta[2] = yc2;
        theta[3] = yc3;
        theta[4] = yc4;
        theta[5] = yc5;
        theta[6] = yc6;
        theta[7] = yc7;
        theta[8] = yc8;
        theta[9] = yc9;
        theta[10] = yc10;
        theta[11] = yc11;
        theta[12] = mu1;
        theta[13] = mu2;
        theta[14] = mu3;
        theta[15] = mu4;
        theta[16] = mu5;
        theta[17] = ks1;
        theta[18] = ks2;
        theta[19] = ks3;
        theta[20] = ks4;
        theta[21] = ks5;
        theta[22] = k1;
        theta[23] = k2;
        theta[24] = k3;

        x_hat = integrate_ode_rk45(cbf, x0_1, t0, ts, theta, y_r, y_i,1.0E-6, 1.0E-6, 1.0E8);
    }
}
model{
    mu1~normal(0.5,0.3);
    mu2~normal(0.5,0.3);
    mu3~normal(0.5,0.3);
    mu4~normal(0.5,0.3);
    mu5~normal(0.5,0.3);
    ks1~normal(0.5,0.3);
    ks2~normal(0.5,0.3);
    ks3~normal(0.5,0.3);
    ks4~normal(0.5,0.3);
    ks5~normal(0.5,0.3);
    k1~normal(0.5,0.3);
    k2~normal(0.5,0.3);
    k3~normal(0.5,0.3);
    yc1~normal(0.5,0.3);
    yc2~normal(0.5,0.3);
    yc3~normal(0.5,0.3);
    yc4~normal(0.5,0.3);
    yc5~normal(0.5,0.3);
    yc6~normal(0.5,0.3);
    yc7~normal(0.5,0.3);
    yc8~normal(0.5,0.3);
    yc9~normal(0.5,0.3);
    yc10~normal(0.5,0.3);
    yc11~normal(0.5,0.3);
    sigma~cauchy(0,1);
    for (t in 1:T)
        xn[t]~normal(x_hat[t],sigma);
}

generated quantities{
    real log_lik[T,8];
    for (t in 1:T)
        for (n in 1:8)
            log_lik[t,n]=normal_lpdf(xn[t,n]|x_hat[t,n],sigma);
}

Both, Stan model and running scripts can be found in model/.

4. Results

The results for Camu et al., box 1 and box 2 of Papalexandratou et al. can be found in output/. The R scripts for re-producing the following results can be found in scripts/.

4.1. Model selection

By conduction Leave-one-out cross-validation and widely applicable information criterion for the four models on all the data sets, the results suggest that models 3 and 4 have similar performances, at leas for the data corresponding to Camu and Box 2.

Figure 3. LOO and WAIC for the models (a) Camu, (b) box 1 and (c) box 2

Figure 3. LOO and WAIC for the models (a) Camu, (b) box 1 and (c) box 2

However, we decided to take model 4 as the most plausible one given that the maximum growth rate of AAB on EtOH for model 3 showed a bimodal distribution.

Figure 4. Posterior distribution of the maximum especific growth rate of AAB on EtOH for model 3

Figure 4. Posterior distribution of the maximum especific growth rate of AAB on EtOH for model 3

4.2. Diagnostics for model 4

For Camu, box 1 and box 2, the traceplots did not show evidence of problems in the mixing of the chains.

Figure 5. Traceplots of the scaled 24 parameters of the model for Camu (excluding sigma).

Figure 5. Traceplots of the scaled 24 parameters of the model for Camu (excluding sigma).

Figure 6. Traceplots of the scaled 24 parameters of the model for Box 1 (excluding sigma).

Figure 6. Traceplots of the scaled 24 parameters of the model for Box 1 (excluding sigma).

Figure 7. Traceplots of the scaled 24 parameters of the model for Box 2 (excluding sigma).

Figure 7. Traceplots of the scaled 24 parameters of the model for Box 2 (excluding sigma).

This is confirmed by the obtained \(\hat{R}\) for Camu:

##        mean se_mean    sd  2.5%   25%   50%   75% 97.5% n_eff  Rhat valid
## mu1   0.248   0.002 0.094 0.098 0.179 0.237 0.304 0.464  3678 1.001     1
## mu2   0.363   0.002 0.107 0.178 0.286 0.358 0.433 0.593  4299 1.001     1
## mu3   0.358   0.001 0.068 0.237 0.311 0.354 0.402 0.501  4696 1.001     1
## mu4   0.378   0.002 0.095 0.200 0.317 0.376 0.437 0.570  2152 1.001     1
## mu5   0.009   0.001 0.016 0.000 0.002 0.005 0.010 0.048   603 1.001     1
## ks1   0.677   0.004 0.265 0.181 0.490 0.667 0.859 1.209  5366 1.001     1
## ks2   0.620   0.003 0.256 0.141 0.441 0.611 0.788 1.145  6310 1.000     1
## ks3   0.731   0.003 0.239 0.298 0.560 0.721 0.885 1.226  4829 1.001     1
## ks4   0.724   0.003 0.250 0.258 0.547 0.716 0.887 1.234  6795 1.000     1
## ks5   0.567   0.004 0.265 0.091 0.373 0.558 0.744 1.115  5476 1.001     1
## k1    0.743   0.002 0.111 0.551 0.665 0.734 0.814 0.978  5023 1.000     1
## k2    0.047   0.000 0.012 0.028 0.039 0.046 0.054 0.076  3945 1.001     1
## k3    0.257   0.001 0.053 0.163 0.220 0.254 0.290 0.373  5779 1.000     1
## yc1   0.626   0.003 0.208 0.283 0.470 0.605 0.759 1.079  5558 1.000     1
## yc2   0.317   0.002 0.119 0.125 0.229 0.301 0.386 0.595  3891 1.000     1
## yc3   0.677   0.003 0.185 0.388 0.541 0.653 0.788 1.103  4758 1.001     1
## yc4   0.325   0.002 0.192 0.026 0.178 0.306 0.452 0.748  5892 1.001     1
## yc5   0.248   0.002 0.141 0.021 0.144 0.236 0.336 0.560  5062 1.001     1
## yc6   0.116   0.002 0.071 0.040 0.076 0.103 0.137 0.266   969 1.001     1
## yc7   0.696   0.002 0.102 0.513 0.626 0.691 0.761 0.911  4196 1.000     1
## yc8   0.432   0.004 0.263 0.023 0.225 0.408 0.612 0.997  4814 1.001     1
## yc9   0.513   0.001 0.086 0.359 0.454 0.508 0.567 0.695  4584 1.000     1
## yc10  0.035   0.001 0.040 0.001 0.014 0.027 0.045 0.108  1023 1.001     1
## yc11  0.449   0.004 0.266 0.028 0.242 0.429 0.624 1.026  4805 1.001     1
## sigma 0.149   0.000 0.010 0.131 0.142 0.149 0.156 0.171  6549 1.000     1
##          Q5   Q50   Q95 MCSE_Q2.5 MCSE_Q25 MCSE_Q50 MCSE_Q75 MCSE_Q97.5
## mu1   0.114 0.237 0.425     0.002    0.001    0.002    0.003      0.004
## mu2   0.201 0.358 0.549     0.003    0.002    0.002    0.002      0.005
## mu3   0.253 0.354 0.474     0.002    0.001    0.001    0.001      0.003
## mu4   0.234 0.376 0.535     0.013    0.002    0.001    0.002      0.003
## mu5   0.000 0.005 0.029     0.000    0.000    0.000    0.000      0.009
## ks1   0.256 0.667 1.121     0.010    0.006    0.005    0.004      0.008
## ks2   0.207 0.611 1.058     0.012    0.005    0.004    0.004      0.009
## ks3   0.357 0.721 1.143     0.007    0.005    0.004    0.004      0.007
## ks4   0.324 0.716 1.143     0.006    0.005    0.004    0.004      0.009
## ks5   0.145 0.558 1.014     0.008    0.006    0.005    0.004      0.010
## k1    0.576 0.734 0.935     0.003    0.002    0.001    0.002      0.005
## k2    0.031 0.046 0.070     0.000    0.000    0.000    0.000      0.001
## k3    0.176 0.254 0.349     0.002    0.001    0.001    0.001      0.002
## yc1   0.324 0.605 0.995     0.005    0.003    0.004    0.003      0.006
## yc2   0.150 0.301 0.531     0.002    0.001    0.002    0.002      0.008
## yc3   0.420 0.653 1.015     0.003    0.003    0.003    0.004      0.011
## yc4   0.048 0.306 0.670     0.002    0.003    0.003    0.004      0.009
## yc5   0.039 0.236 0.499     0.003    0.003    0.003    0.003      0.009
## yc6   0.047 0.103 0.214     0.001    0.001    0.001    0.001      0.020
## yc7   0.537 0.691 0.875     0.003    0.002    0.002    0.002      0.005
## yc8   0.046 0.408 0.896     0.003    0.005    0.005    0.005      0.011
## yc9   0.382 0.508 0.662     0.003    0.002    0.001    0.002      0.003
## yc10  0.003 0.027 0.086     0.000    0.000    0.000    0.001      0.007
## yc11  0.056 0.429 0.919     0.002    0.006    0.005    0.004      0.009
## sigma 0.133 0.149 0.167     0.000    0.000    0.000    0.000      0.001
##       MCSE_SD Bulk_ESS Tail_ESS
## mu1     0.001     3530     4066
## mu2     0.001     4080     4193
## mu3     0.001     4463     4105
## mu4     0.001     2645     1083
## mu5     0.001     2304     1060
## ks1     0.003     4925     2708
## ks2     0.002     5750     2817
## ks3     0.002     4568     3805
## ks4     0.002     6264     3280
## ks5     0.003     4624     2149
## k1      0.001     5143     5705
## k2      0.000     4024     4848
## k3      0.000     5719     5351
## yc1     0.002     5237     4326
## yc2     0.001     3870     4930
## yc3     0.002     4750     5733
## yc4     0.002     5428     3831
## yc5     0.001     4193     2329
## yc6     0.002     3069     1388
## yc7     0.001     4141     5242
## yc8     0.003     3610     1696
## yc9     0.001     4499     5143
## yc10    0.001     2729     1897
## yc11    0.003     3471     2125
## sigma   0.000     6649     5429

Box 1:

##        mean se_mean    sd  2.5%   25%   50%   75% 97.5% n_eff  Rhat valid
## mu1   0.063   0.000 0.025 0.023 0.046 0.060 0.077 0.120  4254 1.000     1
## mu2   0.083   0.000 0.030 0.039 0.062 0.079 0.100 0.152  4568 1.000     1
## mu3   0.417   0.001 0.096 0.248 0.348 0.411 0.479 0.626  4883 1.000     1
## mu4   0.149   0.001 0.051 0.058 0.112 0.146 0.181 0.257  4931 1.000     1
## mu5   0.025   0.000 0.017 0.002 0.012 0.022 0.035 0.065  4094 1.000     1
## ks1   0.623   0.004 0.274 0.114 0.426 0.617 0.812 1.162  5698 1.000     1
## ks2   0.502   0.004 0.283 0.038 0.282 0.482 0.694 1.096  4907 1.000     1
## ks3   0.576   0.004 0.251 0.130 0.392 0.564 0.744 1.098  4578 1.001     1
## ks4   0.626   0.003 0.274 0.129 0.430 0.615 0.810 1.193  7072 1.001     1
## ks5   0.552   0.003 0.271 0.071 0.350 0.542 0.738 1.117  6348 1.001     1
## k1    0.311   0.001 0.071 0.200 0.260 0.302 0.350 0.477  4715 1.000     1
## k2    0.310   0.001 0.088 0.158 0.248 0.303 0.364 0.499  3637 1.001     1
## k3    0.061   0.000 0.028 0.022 0.041 0.056 0.075 0.130  4027 1.001     1
## yc1   0.928   0.003 0.223 0.513 0.773 0.921 1.074 1.380  6045 1.000     1
## yc2   0.136   0.002 0.092 0.017 0.073 0.117 0.178 0.367  2869 1.001     1
## yc3   1.042   0.002 0.199 0.684 0.903 1.032 1.171 1.451  6337 1.000     1
## yc4   0.419   0.003 0.209 0.054 0.267 0.407 0.558 0.859  6228 1.000     1
## yc5   0.396   0.003 0.173 0.075 0.273 0.395 0.514 0.736  4465 1.000     1
## yc6   0.541   0.003 0.218 0.172 0.385 0.525 0.678 1.007  5565 1.001     1
## yc7   0.214   0.001 0.051 0.138 0.178 0.206 0.242 0.338  3830 1.001     1
## yc8   0.506   0.003 0.247 0.085 0.325 0.485 0.666 1.048  6324 1.001     1
## yc9   0.103   0.000 0.034 0.048 0.079 0.099 0.122 0.183  5144 1.000     1
## yc10  0.412   0.003 0.193 0.077 0.274 0.398 0.539 0.822  3899 1.001     1
## yc11  0.478   0.003 0.248 0.058 0.292 0.461 0.643 0.998  6122 1.000     1
## sigma 0.167   0.000 0.014 0.144 0.158 0.167 0.176 0.196  6203 1.000     1
##          Q5   Q50   Q95 MCSE_Q2.5 MCSE_Q25 MCSE_Q50 MCSE_Q75 MCSE_Q97.5
## mu1   0.029 0.060 0.109     0.001    0.000    0.000    0.000      0.001
## mu2   0.043 0.079 0.138     0.000    0.000    0.001    0.001      0.002
## mu3   0.270 0.411 0.587     0.002    0.002    0.001    0.002      0.003
## mu4   0.071 0.146 0.237     0.001    0.001    0.001    0.001      0.002
## mu5   0.004 0.022 0.058     0.000    0.000    0.000    0.000      0.001
## ks1   0.180 0.617 1.075     0.009    0.006    0.005    0.004      0.012
## ks2   0.077 0.482 1.000     0.006    0.008    0.006    0.004      0.013
## ks3   0.184 0.564 1.017     0.010    0.006    0.005    0.005      0.007
## ks4   0.194 0.615 1.098     0.011    0.005    0.005    0.004      0.010
## ks5   0.127 0.542 1.015     0.009    0.007    0.005    0.004      0.011
## k1    0.214 0.302 0.445     0.001    0.001    0.001    0.001      0.005
## k2    0.179 0.303 0.463     0.002    0.002    0.001    0.002      0.005
## k3    0.026 0.056 0.114     0.000    0.000    0.000    0.001      0.002
## yc1   0.575 0.921 1.306     0.009    0.004    0.003    0.004      0.006
## yc2   0.027 0.117 0.310     0.001    0.001    0.002    0.002      0.009
## yc3   0.731 1.032 1.385     0.006    0.003    0.003    0.003      0.006
## yc4   0.095 0.407 0.786     0.006    0.003    0.003    0.005      0.008
## yc5   0.114 0.395 0.687     0.008    0.003    0.003    0.003      0.005
## yc6   0.214 0.525 0.929     0.004    0.004    0.003    0.003      0.010
## yc7   0.146 0.206 0.309     0.001    0.001    0.001    0.001      0.003
## yc8   0.131 0.485 0.950     0.006    0.004    0.005    0.004      0.008
## yc9   0.055 0.099 0.163     0.001    0.001    0.000    0.001      0.002
## yc10  0.117 0.398 0.753     0.009    0.004    0.003    0.004      0.006
## yc11  0.094 0.461 0.919     0.005    0.005    0.005    0.005      0.007
## sigma 0.147 0.167 0.191     0.000    0.000    0.000    0.000      0.001
##       MCSE_SD Bulk_ESS Tail_ESS
## mu1     0.000     3708     3104
## mu2     0.000     4045     4078
## mu3     0.001     4415     3091
## mu4     0.001     4547     4417
## mu5     0.000     4049     4292
## ks1     0.003     5004     2310
## ks2     0.003     3926     2351
## ks3     0.003     4143     2578
## ks4     0.002     6142     2867
## ks5     0.002     5374     2558
## k1      0.001     5108     5391
## k2      0.001     3710     4256
## k3      0.000     4325     5382
## yc1     0.002     5898     3931
## yc2     0.001     3294     3595
## yc3     0.002     6164     5292
## yc4     0.002     5516     3248
## yc5     0.002     4120     2888
## yc6     0.002     5039     4222
## yc7     0.001     3998     4746
## yc8     0.002     5382     3182
## yc9     0.000     5302     5313
## yc10    0.002     3439     2123
## yc11    0.002     5208     3077
## sigma   0.000     6383     5661

And for Box 2:

##        mean se_mean    sd  2.5%   25%   50%   75% 97.5% n_eff  Rhat valid
## mu1   0.366   0.002 0.131 0.157 0.269 0.351 0.446 0.664  4125 1.001     1
## mu2   0.573   0.002 0.151 0.293 0.468 0.568 0.671 0.884  5038 1.000     1
## mu3   0.499   0.002 0.151 0.233 0.388 0.490 0.601 0.814  5123 1.001     1
## mu4   0.168   0.001 0.053 0.078 0.129 0.164 0.202 0.278  4447 1.000     1
## mu5   0.021   0.000 0.016 0.001 0.009 0.018 0.030 0.059  4273 1.001     1
## ks1   0.698   0.003 0.251 0.230 0.523 0.691 0.865 1.218  5731 1.000     1
## ks2   0.614   0.003 0.256 0.126 0.437 0.610 0.787 1.130  6131 1.000     1
## ks3   0.450   0.003 0.247 0.048 0.261 0.432 0.614 0.983  5292 1.001     1
## ks4   0.574   0.004 0.287 0.056 0.364 0.561 0.768 1.164  5280 1.001     1
## ks5   0.525   0.003 0.273 0.053 0.324 0.512 0.710 1.094  6441 1.000     1
## k1    0.652   0.003 0.176 0.333 0.519 0.648 0.775 1.003  4486 1.000     1
## k2    0.283   0.002 0.104 0.106 0.208 0.273 0.348 0.510  3170 1.002     1
## k3    0.127   0.001 0.052 0.053 0.091 0.119 0.153 0.253  5670 1.000     1
## yc1   0.682   0.002 0.201 0.352 0.533 0.662 0.811 1.124  6828 1.000     1
## yc2   0.055   0.001 0.056 0.001 0.016 0.038 0.075 0.208  3922 1.001     1
## yc3   0.837   0.003 0.229 0.442 0.666 0.821 0.986 1.316  7063 1.001     1
## yc4   0.291   0.002 0.155 0.031 0.176 0.281 0.392 0.628  5086 1.000     1
## yc5   0.201   0.002 0.135 0.010 0.092 0.181 0.285 0.511  5601 1.000     1
## yc6   0.650   0.003 0.208 0.295 0.499 0.636 0.778 1.106  6858 1.000     1
## yc7   0.363   0.002 0.131 0.186 0.267 0.337 0.431 0.689  3386 1.001     1
## yc8   0.415   0.003 0.248 0.035 0.222 0.388 0.576 0.963  6229 1.000     1
## yc9   0.108   0.001 0.069 0.012 0.059 0.096 0.141 0.282  4111 1.001     1
## yc10  0.473   0.003 0.192 0.112 0.344 0.466 0.596 0.874  4627 1.000     1
## yc11  0.554   0.003 0.260 0.086 0.367 0.541 0.730 1.081  6431 1.000     1
## sigma 0.167   0.000 0.013 0.144 0.158 0.166 0.176 0.195  6835 1.001     1
##          Q5   Q50   Q95 MCSE_Q2.5 MCSE_Q25 MCSE_Q50 MCSE_Q75 MCSE_Q97.5
## mu1   0.180 0.351 0.606     0.001    0.002    0.003    0.003      0.007
## mu2   0.332 0.568 0.833     0.005    0.003    0.003    0.003      0.007
## mu3   0.263 0.490 0.757     0.005    0.003    0.003    0.003      0.006
## mu4   0.089 0.164 0.261     0.001    0.001    0.001    0.001      0.002
## mu5   0.002 0.018 0.052     0.000    0.000    0.000    0.000      0.001
## ks1   0.301 0.691 1.124     0.013    0.005    0.004    0.004      0.012
## ks2   0.194 0.610 1.045     0.009    0.006    0.003    0.004      0.010
## ks3   0.084 0.432 0.889     0.005    0.005    0.004    0.005      0.006
## ks4   0.118 0.561 1.074     0.008    0.007    0.005    0.005      0.009
## ks5   0.099 0.512 0.991     0.007    0.007    0.005    0.005      0.010
## k1    0.372 0.648 0.949     0.003    0.003    0.004    0.003      0.005
## k2    0.127 0.273 0.466     0.002    0.002    0.002    0.002      0.006
## k3    0.060 0.119 0.225     0.001    0.001    0.001    0.001      0.003
## yc1   0.388 0.662 1.044     0.004    0.003    0.003    0.003      0.007
## yc2   0.003 0.038 0.166     0.000    0.001    0.001    0.001      0.005
## yc3   0.491 0.821 1.236     0.004    0.004    0.004    0.003      0.007
## yc4   0.058 0.281 0.570     0.004    0.003    0.002    0.003      0.007
## yc5   0.020 0.181 0.451     0.001    0.002    0.002    0.003      0.006
## yc6   0.340 0.636 1.029     0.006    0.004    0.003    0.004      0.008
## yc7   0.203 0.337 0.611     0.002    0.002    0.002    0.003      0.011
## yc8   0.063 0.388 0.865     0.004    0.005    0.004    0.004      0.008
## yc9   0.021 0.096 0.240     0.001    0.001    0.001    0.002      0.005
## yc10  0.165 0.466 0.797     0.010    0.004    0.003    0.003      0.007
## yc11  0.140 0.541 1.000     0.008    0.005    0.004    0.004      0.010
## sigma 0.147 0.166 0.190     0.000    0.000    0.000    0.000      0.001
##       MCSE_SD Bulk_ESS Tail_ESS
## mu1     0.001     3989     4877
## mu2     0.002     4934     5002
## mu3     0.001     4631     3335
## mu4     0.001     4085     3940
## mu5     0.000     3716     3126
## ks1     0.002     5178     2831
## ks2     0.002     5504     2626
## ks3     0.002     4436     2939
## ks4     0.003     4466     2468
## ks5     0.002     5130     2316
## k1      0.002     4469     4740
## k2      0.001     3312     4391
## k3      0.001     5891     5217
## yc1     0.002     6660     5821
## yc2     0.001     3893     4222
## yc3     0.002     6895     6028
## yc4     0.002     4552     3173
## yc5     0.001     5264     3324
## yc6     0.002     6518     4945
## yc7     0.002     3527     4882
## yc8     0.002     5004     3470
## yc9     0.001     3486     2586
## yc10    0.002     4256     2431
## yc11    0.002     5632     3017
## sigma   0.000     7171     5189

By looking at the predictions, the model performs remarkably well for all data sets:

Figure 8. Predictions of the model for Camu (a) Glc, (b) Fru, (c) EtOH, (d) LA, (e) Ac, (f) Y, (g) LAB and (h) AAB. Solid red lines represent the median of the posterior distribution at each time point. Dashed red lines represent the 95% credible interval. Black dots represent the real data reported in [4].

Figure 8. Predictions of the model for Camu (a) Glc, (b) Fru, (c) EtOH, (d) LA, (e) Ac, (f) Y, (g) LAB and (h) AAB. Solid red lines represent the median of the posterior distribution at each time point. Dashed red lines represent the 95% credible interval. Black dots represent the real data reported in [4].

Figure 9. Predictions of the model for box 1 (a) Glc, (b) Fru, (c) EtOH, (d) LA, (e) Ac, (f) Y, (g) LAB and (h) AAB. Solid red lines represent the median of the posterior distribution at each time point. Dashed red lines represent the 95% credible interval. Black dots represent the real data reported in [3].

Figure 9. Predictions of the model for box 1 (a) Glc, (b) Fru, (c) EtOH, (d) LA, (e) Ac, (f) Y, (g) LAB and (h) AAB. Solid red lines represent the median of the posterior distribution at each time point. Dashed red lines represent the 95% credible interval. Black dots represent the real data reported in [3].

Figure 10. Predictions of the model for box 2 (a) Glc, (b) Fru, (c) EtOH, (d) LA, (e) Ac, (f) Y, (g) LAB and (h) AAB. Solid red lines represent the median of the posterior distribution at each time point. Dashed red lines represent the 95% credible interval. Black dots represent the real data reported in [3].

Figure 10. Predictions of the model for box 2 (a) Glc, (b) Fru, (c) EtOH, (d) LA, (e) Ac, (f) Y, (g) LAB and (h) AAB. Solid red lines represent the median of the posterior distribution at each time point. Dashed red lines represent the 95% credible interval. Black dots represent the real data reported in [3].

4.2. Differences between parameters from Box 1 and Box 2

At looking at the obtained posterior distributions from box 1 and 2 it is possible to have hints on how subtle differences in how both were conducted might affect kinetic parameters.

Figure 11. Violin plots of the re-scaled posterior distributions of the 24 parameters showing the differences between boxes 1 and 2.

Figure 11. Violin plots of the re-scaled posterior distributions of the 24 parameters showing the differences between boxes 1 and 2.

We hypothesize that three main differences between these two fermentations played a role in the disparity of the parameters: (1) initial concentration of microorganisms, (2) concentration ratios of initial substrates and (3) evaporation rates.

4.2.1. Initial concentration of microorganisms

The biggest difference between the two trials is the initial concentration of Y. This were equal to 0.18 and 0.005 \(mg\,g^{-1}\) for box 1 and 2 respectively. This could be the cause of the differing maximum growth rates estimated for Y between the trials. Similar effect can be seen on parameters directly related to the population of Y such as, the yield coefficient of Y growing on Glc (\(Y_{\text{Glc}|Y}\)) where box 1 showed a higher value than box 2 as a consequence of its higher initial inoculum of Y.

4.2.2. Concentration ratios of initial substrates

It is worth of attention to take into account that the ratio of the initial substrates Glc and Fru differed between both trials. This ratio (\(\frac{Glc}{Fru}\)) were equal to 1.11 and 0.64 for boxes 1 and 2 respectively. Such a dissimilarity would be the explanation of the estimates obtained for the yield coefficient of the growth of LAB on Glc (\(Y_{\text{Glc}|\text{LAB}}\)) suggesting a resource-type competition between Y and LAB.

4.2.3. Evaporation rate

Finally, regarding the yield coefficient of growth of AAB on EtOH (\(Y_{\text{EtOH}|\text{AAB}}\)), the higher value obtained in box 1 with respect to box 2 might be explained for a possibly greater evaporation rate of EtOH in box 1 (remember that box 1 was conducted under a metal roof instead as in a fermentary room as box 2). The evaporation rate might explain the difference of other parameters too, such as the saturation constant for the growth of AAB on LA (\(K_{\text{LA}}^{\text{AAB}}\)) and the yield coefficient of consumption of LA by AAB (\(Y_{\text{LA}|\text{AAB}}\)).

5. Conclusion

The aforementioned model represents the first fully working kinetic model for cocoa bean fermentation capable of predict notably well fermentation time series. It represents also an interesting method to see into the mechanisms of the process that might serve for its optimization in the future.

Acknowledgements

The authors would like to express their acknowledgement to Barry Callebaut for funding this research as part of the COMETA project, driven by Jacobs University Bremen. Finally, but not the least, MM would like to give a deep thank you to all the Stan Development team as well as the active community of the discussion group of Stan for their help.

References

[1] Moreno-Zambrano, M., Grimbs, S., Ullrich M. S. and Hütt, M-T. (2018). A mathematical model of cocoa bean fermantation. Royal Society Open Science, 5(10), 180 964.doi: 10.1098/rsos.180964

[2] De Vuyst, L. and Weckx, S. (2016). The cocoa bean fermentation process: from ecosystem analysis to starter culture development. Journal of Applied Microbiology. 121, 5-17.

[3] Papalexandratou, Z., Vrancken, G., Bruyne, K. D., Vandamme, P. and De Vuyst, L. (2011). Spontaneous organic cocoa bean fermentations in Brazil are characterized by a restricted species diversity of lactic acid bacteria and acetic acid bacteria. Food Microbiology. 28, 1326-1338.

[4] Camu, N., De Winter, T., Verbrugghe, K., Cleenwerck, I., Vandamme, P., Takrama, J. S., Vancanneyt, M. and De Vuyst, L. (2007). Dynamics and biodiversity of populations of lactic acid bacteria and acetic acid bacteria involved in spontaneous heap fermentation of cocoa beans in Ghana. Applied and Environmental Microbiology. 74:6, 1809-1824.