Inverse problem

A starting point in any inverse problem is selection of parameters to be recovered from available data. We note that if phase change effects are negligibly small then the heat capacity $C$ and thermal conductivity $\lambda$ cannot be resolved separately by analyzing the observed temperature time series; only the thermal diffusivity $\lambda/C$ can be found (Hinkel, 1997). Therefore, in order to remove ambiguity in determining the soil properties, we set the heat capacity $C_f^{(i)}$ equal to a value that is typical for a soil type in the $i$-th layer. Consecutively, we define the control vector $\mathscr{C}$ as a set consisting of thermal conductivity $\lambda^{(i)}_f$, and parameters $\eta^{(i)}, T^{(i)}_*, b^{(i)}$ describing the unfrozen water content for each soil horizon, or

\begin{displaymath}
\mathscr{C}=\{\lambda^{(i)}_f, \eta^{(i)}, T^{(i)}_*, b^{(i)}\}_{i=1}^n.
\end{displaymath} (6)

 

For each physically realistic control vector $\mathscr{C}$, it is possible to compute temperature dynamics and compare it to the measured data. The available measured data are organized as two vectors: $\bm{d}_a$ and $\bm{d}_b$, associated with high-resolution-in-time temperature records, and once-a-year measurements in the borehole, respectively. In this article, we mark quantities related to the active layer and bore hole measurements by subscripts $a$ and $b$, respectively. The data are related to the control vector $\mathscr{C}$ by

\begin{displaymath}
\bm{m}_a(\mathscr{C})-\bm{d}_a=\bm{e}_a, \ \ \ \ \bm{m}_b(\mathscr{C})-\bm{d}_b=\bm{e}_b,
\end{displaymath}

 

where $\bm{m}$ is the modeled counterparts of the data, and $\bm{e}$ is the misfit vector. In theory, if there are no sampling errors or model inadequacies, the misfit vector $\bm{e}$ can be reduced to zero. If there are errors in the data or in the model, the aim is to minimize $\bm{e}$ by varying the control vector $\mathscr{C}$.

Assuming that the errors $\bm{e}$ have a Gaussian probability distribution, we define the cost function $J$ according to (Tikhonov and Leonov, 1996; Beck and Arnold, 1977) as

\begin{displaymath}
J=J_a+J_b+J_r,
\end{displaymath} (7)

 

where $J_a$, $J_b$ are related to model/data misfit, and $J_r$ is the regularization term:

$\displaystyle J_\alpha(\mathscr{C})$ $\textstyle =$ $\displaystyle \frac12\big(\bm{m}_\alpha(\mathscr{C})-\bm{d}_\alpha\big)^{t}\bm{...
...1}_\alpha\big(\bm{m}_\alpha(\mathscr{C})-\bm{d}_\alpha\big), \ \ \ \ \alpha=a,b$ (8)
$\displaystyle J_r(\mathscr{C})$ $\textstyle =$ $\displaystyle \frac12\big(\mathscr{C}-\mathscr{C}_0\big)^t\bm{R}^{-1}_m\big(\mathscr{C}-\mathscr{C}_0\big).$ (9)

 

Here, the superscript $t$ stands for the transposition operator, $\bm{R}_\alpha$ and $\bm{R}_m$ are covariance matrices for the data $\bm{d}_\alpha$ and control vector, respectively. Finally, $\mathscr{C}_0$ is an initial approximation vector containing a priori estimates of soil properties based on soil texture, mineral composition and porosity (Nicolsky et al., 2007).

Physically, terms $J_a$ and $J_b$ describe a mean square deviation of the computed soil temperatures from the measured ones, while the term $J_r$ incorporates information about the soil properties that could be derived from the field/laboratory experiments, or from existing soil property databases and published data (Kersten, 1949). Typically, $\bm{R}_m$ is a diagonal matrix with the expected variances $\{\sigma_m^2\}$ of each control vector variable down the main diagonal. For example, if a priori estimates in $\mathscr{C}_0$ are known with large uncertainties, then the expected variances $\{\sigma_m^2\}$ are large, and hence contribution of the term $J_r$ to the cost function $J$ is small. The opposite is also true. Namely, if the initial approximation $\mathscr{C}_0$ is known with a good degree of accuracy, then contribution of $J_r$ is large and a minimum of $J$ is close to the a priori estimates in $\mathscr{C}_0$.

Statistical interpretation of the least square method considers the cost function as an argument of the Gaussian probability distribution (Thacker, 1989; Wunsch, 1996). Under this statistical interpretation, the optimal control vector $\hat{\mathscr{C}}$ defined by

\begin{displaymath}
J(\hat{\mathscr{C}})=\min_{\mathscr{C}}J(\mathscr{C})
\end{displaymath}

 

is the most probable combination of the soil properties for the given temperature realization and prior error statistics. To compute the optimal control vector, we calculate the gradient

\begin{displaymath}
\nabla{J}=\Big\{ \frac{\partial{J}}{\partial{\lambda^{(i)}_...
...i)}_*}},
\frac{\partial{J}}{\partial{b^{(i)}}}\Big\}_{i=1}^n
\end{displaymath}

of the cost function and minimize cost function $J$ by the Quasi-Newton minimization algorithm (Fletcher, 2000). To calculate the gradient $\nabla{J}$, we construct and solve the so-called adjoint model (Jarny et al., 1991; Marchuk, 1995; Wunsch, 1996). The adjoint code is built analytically by transposition of the operator of the tangent linear model, which was obtained by direct differentiation of the forward model code, briefly described in Appendix 5.

In order to estimate the thermal properties uniquely, the boundary conditions must satisfy some requirements, i.e. be monotonic functions of time (Muzylev, 1985). In nature, the rapidly changing weather conditions enforce the surface temperature which does not fulfil this necessary requirement. Consequently, the cost function can have several minima due to non-linearities in the heat equation (1). Therefore, the iterative minimization algorithm can converge to an improper minimum if the iterations are started from an arbitrary selected initial guess $\mathscr{C}_g$. Nevertheless, with the initial guess $\mathscr{C}_g$ within the basin of attraction of the global minimum, the iterative optimization method should converge to the proper minimum even if the model is nonlinear (Thacker, 1989). Usually, the initial guess $\mathscr{C}_g$ is selected to be in the neighborhood of the initial approximation $\mathscr{C}_0$. Thus, the right choice of the initial approximation $\mathscr{C}_0$ is important for efficient convergence. In (Nicolsky et al., 2007), we provide a detailed step-by-step algorithm to compute the initial approximation $\mathscr{C}_0$.

To obtain physically meaningful results, the covariances matrix $\bm{R}_\alpha$ has to be defined appropriately as follows. By definition, if all elements in temperature data $\bm{d}_a$ vector are statistically independent, then $\bm{R}_\alpha$ is a diagonal matrix with expected variances $\sigma_\alpha^2$ of $\bm{e}_\alpha$ down the main diagonal. Since temperature disturbances at the ground surface decay exponentially with depth, then the soil temperature does not change significantly over hourly time intervals at depths more than 0.10 meters (Kudryavtsev, 1978; Carslaw and Jaeger, 1959; Yershov, 1998). Thus, hourly collected temperature observations are not statistically independent. We note that a typical time scale over which air temperature changes randomly is approximately ten days (Blackmon et al., 1984). For the Alaska North Slope Region, these air temperature fluctuations can penetrate up to the depth of 0.3-0.5 meters and significantly (and randomly) change soil temperature in the upper meter on a monthly time scale. Hence, the covariance matrix $\bm{R}_a$ is approximately a sparse matrix with some diagonal structures, the sizes of which are equal to the number of temperature measurements collected in one month. For the sake of simplicity, we approximate $\bm{R}_a$ by a diagonal matrix with $\sigma^2_a$ down on a main diagonal multiplied by weights. Each weight is such that the sum of weights corresponding to measurements within each month is equal to one. In our case, we assume that $\sigma_a$ are equal uncertainties in temperature measurements due to the limited precision of sensors.

Considering a variable $c_i$ in the control vector $\mathscr{C}{=}\{c_i\}_{i=1}^m$ as stochastic and non-correlated, it is possible to define errors of the optimally estimated parameters $\hat{\mathscr{C}}{=}\{\hat{c}_i\}_{i=1}^m$. According to Thacker (1989), the error covariance of the corresponding variables of the control vector can be calculated as diagonal elements of the inverse Hessian matrix $\bm{H}^{-1}$, where $\bm{H}{=}\{h_{ij}\}_{i,j=1}^m$, and $h_{ij}{=}\partial^2J/\partial{c_i}\partial{c_j}$. The most simple method to compute $\bm{H}$ is to approximate it via finite-differences (Schröter, 1989). Following this approach, we individually perturb the components $c_i$ of the control vector in the vicinity of the optimal values $\hat{c}_i$. After that, the finite-difference approximation of the Hessian can be defined as

\begin{displaymath}
h_{ij}\approx\frac{1}{2\delta}\Big(\nabla_jJ(\hat{\mathscr{...
...{I}_i)-\nabla_jJ(\hat{\mathscr{C}}-\delta\mathscr{I}_i)\Big),
\end{displaymath}

where $\delta$ is a positive real-valued number, $\mathscr{I}_i=(0,0,\dots,1,\dots,0)$ is an $m$-dimensional zero-vector whose $i$-th element is one, and the quantity $\nabla_jJ$ represents the $j$-th component of cost function's gradient $\nabla{J}$. The inverse of the Hessian $\bm{H}$ can be easily computed since the number $m$ of variables in the control vector $\mathscr{C}$ is typically less than a hundred.