Abstract
With the advent of massive data sets, much of the computational science and engineering community has moved toward data-intensive approaches in regression and classification. However, these present significant challenges due to increasing size, complexity, and dimensionality of the problems. In particular, covariance matrices in many cases are numerically unstable, and linear algebra shows that often such matrices cannot be inverted accurately on a finite precision computer. A common ad hoc approach to stabilizing a matrix is application of a so-called nugget. However, this can change the model and introduce error to the original solution. It is well known from numerical analysis that ill-conditioned matrices cannot be accurately inverted. In this paper, we develop a multilevel computational method that scales well with the number of observations and dimensions. A multilevel basis is constructed adapted to a kd-tree partitioning of the observations. Numerically unstable covariance matrices with large condition numbers can be transformed into well-conditioned multilevel ones without compromising accuracy. Moreover, it is shown that the multilevel prediction exactly solves the best linear unbiased predictor (BLUP) and generalized least squares (GLS) model, but is numerically stable. The multilevel method is tested on numerically unstable problems of up to 25 dimensions. Numerical results show speedups of up to 42,050 times for solving the BLUP problem, but with the same accuracy as the traditional iterative approach. For very ill-conditioned cases, the speedup is infinite. In addition, decay estimates of the multilevel covariance matrices are derived based on high dimensional interpolation techniques from the field of numerical analysis. This work lies at the intersection of statistics, uncertainty quantification, high performance computing, and computational applied mathematics.
Similar content being viewed by others
References
Cressie, N.A., Johannesson, G.: Spatial prediction for massive datasets. Faculty of Engineering and Information Sciences (2006). Papers: Part A. 5976
Cressie, N.: Statistics for spatial data. John Wiley & Sons, Incorporated, New York, UNITED STATES (1993)
Henderson, C.R.: Estimation of genetic parameters. Ann. Math. Stat. 21(2), 309–310 (1950)
Henderson, C.R.: Best linear unbiased estimation and prediction under a selection model. Biometrics 31(2), 423–447 (1975)
Liu, X.: Methods and applications of longitudinal data analysis. Academic Press, Oxford (2016)
Liu, X.-Q., Rong, J.-Y., Liu, X.-Y.: Best linear unbiased prediction for linear combinations in general mixed linear models. J. Multivar. Anal. 99(8), 1503–1517 (2008)
Stein, M.L.: Interpolation of spatial data : some theory for kriging. Springer series in statistics. Springer, New York (1999)
Golub, G.H., Van Loan, C.F.: Matrix Computations. The Johns Hopkins University Press, third edition (1996)
Minden, V., Damle, A., Ho, K.L., Ying, L.: Fast spatial gaussian process maximum likelihood estimation via skeletonization factorizations. arXiv:1603.08057v3 (2016)
Nowak, W., Litvinenko, A.: Kriging and spatial design accelerated by orders of magnitude: combining low-rank covariance approximations with FFT-techniques. Math. Geosci. 45(4), 411–435 (2013)
Khoromskij, B.N., Litvinenko, A., Matthies, H.G.: Application of hierarchical matrices for computing the Karhunen–Loève expansion. Computing 84(1–2), 49–67 (2009)
Litvinenko, A., Sun, Y., Genton, M.G., Keyes, D.E.: Likelihood approximation with hierarchical matrices for large spatial datasets. Comput. Stat. Data Anal. 137, 115–132 (2019)
Geoga, C.J., Anitescu, M., Stein, M.L.: Scalable Gaussian process computations using hierarchical matrices. J. Comput. Graph. Stat. 29(2), 227–237 (2020)
Schäfer, F., Sullivan, T.J., Owhadi, H.: Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity. arXiv:1706.02205 (2020)
Schäfer, F., Katzfuss, M., Owhadi, H.: Sparse Cholesky factorization by Kullback–Leibler minimization. SIAM J. Sci. Comput. 43(3), 2019–2046 (2021)
Castrillón-Candás, J.E., Genton, M.G., Yokota, R.: Multi-level restricted maximum likelihood covariance estimation and kriging for large non-gridded spatial datasets. Spatial Stat. 18 Part A, 105–124 (2016). Spatial Statistics Avignon: Emerging Patterns
Castrillón-Candás, J.E., Li, J., Eijkhout, V.: A discrete adapted hierarchical basis solver for radial basis function interpolation. BIT Numeric. Math. 53(1), 57–86 (2013)
Harbrecht, H., Multerer, M.: Samplets: construction and scattered data compression. J. Comput. Phys. 471, 111616 (2022)
Beylkin, G., Coifman, R., Rokhlin, V.: Fast wavelet transforms and numerical algorithms I. Commun. Pure Appl. Math. 44(2), 141–183 (1991)
Nobile, F., Tempone, R., Webster, C.: A sparse grid stochastic collocation method for partial differential equations with random input data. SIAM J. Numeric. Anal. 46(5), 2309–2345 (2008)
Nobile, F., Tempone, R., Webster, C.: An anisotropic sparse grid stochastic collocation method for partial differential equations with random input data. SIAM J. Numeric. Anal. 46(5), 2411–2442 (2008)
Castrillon-Candas, J.E., Nobile, F., Tempone, R.: Analytic regularity and collocation approximation for pdes with random domain deformations. Comput. Math. App. 71(6), 1173–1197 (2016)
Castrillón-Candás, J.E., Xu, J.: A stochastic collocation approach for parabolic PDEs with random domain deformations. Comput. Math. App. 93, 32–49 (2021)
Castrillón-Candás, J.E., Nobile, F., Tempone, R.F.: A hybrid collocation-perturbation approach for PDEs with random domains. Adv. Comput. Math. 47(3), 40 (2021)
Li, W., Wang, X., Sun, Y., Milanovic, S., Kon, M., Castrillón-Candás, J.E.: Multilevel stochastic optimization for imputation in massive medical data records. IEEE Trans. Big Data (2023). To Appear
Abramowitz, M., Stegun, I.A.: Handbook of mathematical functions: with formulas, graphs, and mathematical tables. Applied mathematics series. Dover Publications, Mineola, New York, USA (1964)
Dasgupta, S., Freund, Y.: Random projection trees and low dimensional manifolds. In: Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing. STOC ’08, pp. 537–546. ACM, New York, NY, USA (2008)
Berg, M.d., Cheong, O., Kreveld, M.v., Overmars, M.: Computational geometry: algorithms and applications, 3rd ed. edn. Springer, Santa Clara, CA, USA (2008)
Ying, L., Biros, G., Zorin, D.: A kernel-independent adaptive fast multipole method in two and three dimensions. J. Comput. Phys. 196(2), 591–626 (2004)
Nielsen, H.B., Lophaven, S.N., Søndergaard, J.: DACE - a Matlab kriging toolbox. Informatics and Mathematical Modelling, Technical University of Denmark, DTU, Richard Petersens Plads, Building 321, DK-2800 Kgs. Lyngby (2002)
Chen, Y., Davis, T.A., Hager, W.W., Rajamanickam, S.: Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate. ACM Trans. Math. Softw. 35(3), 22–12214 (2008)
Davis, T.A., Hager, W.W.: Dynamic supernodes in sparse Cholesky update/downdate and triangular solves. ACM Trans. Math. Softw. 35(4), 27–12723 (2009)
Davis, T., Hager, W.: Row modifications of a sparse Cholesky factorization. SIAM J. Matrix Anal. App. 26(3), 621–639 (2005)
Davis, T., Hager, W.: Multiple-rank modifications of a sparse Cholesky factorization. SIAM J. Matrix Anal. App. 22(4), 997–1013 (2001)
Davis, T., Hager, W.: Modifying a sparse Cholesky factorization. SIAM J. Matrix Anal. App. 20(3), 606–627 (1999)
Griebel, M., Oettershagen, J.: On tensor product approximation of analytic functions. J. Approx. Theory 207, 348–379 (2016)
MATLAB: R2019a. The MathWorks Inc., Natick, Massachusetts (2019)
Bäck, J., Nobile, F., Tamellini, L., Tempone, R.: Stochastic spectral Galerkin and collocation methods for PDEs with random coefficients: a numerical comparison. In: Hesthaven, J.S., Rønquist, E.M. (eds.) Spectral and high order methods for partial differential equations. Lecture Notes in Computational Science and Engineering, vol. 76, pp. 43–62. Springer, Heidelberg, Germany (2011)
Khoromskij, B.N.: Tensor numerical methods in scientific computing. De Gruyter Inc, Berlin/Boston, UNITED STATES (2018)
Babuska, I., Nobile, F., Tempone, R.: A stochastic collocation method for elliptic partial differential equations with random input data. SIAM Rev. 52(2), 317–355 (2010)
Intel Math Kernel Library: Intel, (2018). http://software.intel.com/en-us/articles/intel-mkl/
Baricz, A., Ponnusamy, S., Vuorinen, M.: Functional inequalities for modified Bessel functions. Expo. Math. 29(4), 399–414 (2011)
Acknowledgements
I appreciate the help and advice from George Biros and Lexing Ying for setting up the KIFMM packages. In addition, I am grateful to the intense and very illuminating discussions with Michael Stein and Mihai Anitesc. I also appreciate the support that King Abdullah University of Science and Technology has provided to this project.
Funding
This material is based upon work supported by the National Science Foundation under Grant No. 1736392 and No. 2319011.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares no competing interests.
Additional information
Communicated by: Anthony Nouy
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Polynomial interpolation
In this section, we provide some background on polynomial interpolation in high dimensions. This will be critical to estimate the decay rates of the entries of the multilevel covariance matrix for high dimensional problems.
The decay of the coefficients will directly depend on the analytic properties of the covariance function. The traditional error estimates of polynomial interpolation are based on multi-variate \(m^{th}\) order derivatives. However, for many cases, such as the Matérn covariance function, the derivatives are too complex or expensive to manipulate for even a moderate number of dimensions. This motivates the study of polynomial numerical approximations based on complex analytic extensions, which are much better suited for high dimensions. Much of the discussion that follows has it roots in the field of uncertainty quantification and high dimensional interpolation [20, 22, 36] for partial differential equations.
Consider the problem of approximating a function \(v: \Gamma ^{d} \rightarrow \mathbb {R}\) on the domain \(\Gamma ^{d}\). Without loss of generality, let \(\Gamma \equiv [-1, 1]\) and \(\Gamma ^{d} \equiv \prod _{n = 1}^{d} \Gamma \). Suppose that \(\mathcal {G}\subset \Gamma ^{d}\), then define the following spaces
Suppose that \(\mathcal {P}_{ q}(\Gamma ):= \text {span}\{y^k,\,k=0,\dots ,q\}\), i.e., the space of polynomials of degree at most q. Let \(\mathcal {I}^{m}: C^{0}(\Gamma ) \rightarrow \mathcal {P}_{m-1}(\Gamma )\) be the univariate Lagrange interpolant \(\mathcal {I}_{m}(v(\textbf{y})):= \sum _{k=1}^{m}v(y^{(k)})l_{m,k}(y^{(k)})\), where \(y^{(1)}, \dots , y^{(m)}\) is a set of distinct knots on \(\Gamma \) and \(\{ l_{n,k} \}_{k=0}^{m}\) is a Lagrange basis of the space \(\mathcal {P}_{m-1}(\Gamma )\). The variable \(m \in \mathbb {N}_0\) corresponds to the order of approximation of the Lagrange interpolant. However, for the case of the zero order interpolation, \(m = 0\) corresponds to \(\mathcal {I}_{0} = 0\).
Remark 1
For high dimensional interpolation, the particular set of points \(y^{(1)}, \dots ,\) \(y^{(m)}\) that we will use is the Clenshaw-Curtis abscissas. This is further discussed in this section. However, for now, we assume that the points are only distinct.
For \(m \ge 1\), let \(\Delta _{m}:= \mathcal {I}_{m}-\mathcal {I}_{m-1}\). From the difference operator \(\Delta _{m}\), we can readily observe that \(\mathcal {I}_{m} = \sum _{k=1}^{m} \Delta _{k}\), which is reminiscent of multi resolution wavelet decompositions. The idea is to represent multivariate approximation as a summation of the difference operators.
Consider the multi-index tupple \(\textbf{m}= (m_1,\dots ,m_d)\), where \(\textbf{m}\in \mathbb {N}_0^{d}\), and form the tensor product operator \(\mathcal {S}_{w,d}: \Gamma \rightarrow \mathbb {R}\) as
Note that by \({\Delta ^{n}_{m_n}}(v(\textbf{y}))\), we mean that the difference operator \({\Delta _{m_n}}\) is applied along the \(n^{th}\) dimension in \(\Gamma \).
Let \(C^{0}(\Gamma ^d; \mathbb {R}): = \{ v: \Gamma ^d \rightarrow \mathbb {R}\,\,\) is continuous on \(\Gamma ^d\) and \(\max _{\textbf{y}\in \Gamma ^d} |v(\textbf{y}) |< \infty \}\). From Proposition 1 in [38], it is shown that for any \(v \in C^0(\Gamma ^d;\mathbb {R})\), we have \(\mathcal {S}_{w,d}[v]\in \mathcal {Q}^{d}_{w}\). Moreover, \(\mathcal {S}_{w,d}[v] = v\), for all \(v \in \mathcal {Q}^{d}_{w}\). The key observation to take away is that the operator \(\mathcal {S}_{w,d}[v]\) is exact in the space of polynomials \(\mathcal {Q}^{d}_{w}\). This will be useful in connecting the Lagrange interpolant with Chebyshev polynomials.
Let \(T_k:\Gamma \rightarrow \mathbb {R}\), \(k = 0, 1, \dots \), be a Chebyshev polynomial over \(\Gamma \), which are defined recursively as follows: \(T_0(y) = 1\), \(T_1(y) = y\), \(\dots \), \(T_{k+1}(y) = 2yT_{k}(y) - T_{k-1}(y)\), \(\dots \), where \(y \in \Gamma \). Chebyshev polynomials are well suited for the approximation of functions with analytic extensions on a complex region bounded by a Bernstein ellipse. They bypassing the need of using derivative information and sharp bounds on the error are readily available. Suppose that \(\sigma > 0\) and denote by
as the region bounded by a Bernstein ellipse (see Fig. 5). The following theorem is based on complex analytic extensions on \(\mathcal {E}_{\sigma }\) and provides a control for the Chebyshev polynomial approximation.
Theorem 6
Suppose that for \(u:\Gamma \rightarrow \mathbb {R}\), there exists an analytic extension on \(\mathcal {E}_{\sigma }\). If u \(\le M < \infty \) on \(\mathcal {E}_{\sigma }\), then there exists a sequence of coefficients \(|\alpha _k|\le M / e^{k\sigma }\) such that \(u \equiv \alpha _0 + 2\sum _{k = 1}^{\infty } \alpha _{k} T_{k}\) on \(\mathcal {E}_{\sigma }\). Moreover, if \(y \in \Gamma \), then \(\vert q(y) - \alpha _0 - 2\sum _{k = 1}^{n} \alpha _{k} T_{k}(y) |\le \frac{2\,M}{e^{\sigma } - 1} e^{-n \sigma }\).
Proof
See Theorem 2.25 in [39] \(\square \)
We can now connect the error due to the Lagrange interpolation with Chebyshev expansions. It is known that if \(u \in C(\Gamma ,\mathbb {R})\), then \(\Vert (I - \mathcal {I}_{m})u\Vert _{L^{\infty }(\Gamma )} \le (1 + \Lambda _{m}) \min _{h \in \mathcal {P}_{m-1}} \Vert u - h \Vert _{L^{\infty }(\Gamma )}\), where \(\Lambda _{m}\) is the Lebesgue constant (See Lemma 7 in [40]). Note that \(I:C^{d}(\Xi ;\mathbb {R}) \rightarrow C^{d}(\Xi ;\mathbb {R})\) refers to the identity operator and the domain \(\Xi \) is taken from context. For the previous case, \(\Xi = \Gamma \). Bounds on \(\Lambda _{m}\) are known in the context of the location of the knots \(y^{(1)}, \dots , y^{(m)} \in \Gamma \). In this article, we restrict our attention to Clenshaw-Curtis abscissas \(y^{(j)} = -\cos \left( \frac{\pi (j-1)}{m - 1} \right) ,\,\, j = 1,\dots , m\) and \(\Lambda _m\) is bounded by \(2\pi ^{-1}(\log {(m-1)} + 1) \le 2m - 1\) (see [40]). Since the interpolation operator \(\mathcal {I}_{m}\) is exact on \(\mathcal {P}_{m - 1}\), then if \(u:\Gamma \rightarrow \mathbb {R}\) has an analytic extension in \(\mathcal {E}_{\sigma }\), we have from Theorem 9 (following a similar approach as in [40]) that \(\Vert (I - \mathcal {I}_{m})u\Vert _{L^{\infty }(\Gamma _n)} \le (1 + \Lambda _{m}) \frac{2\,M}{e^{\sigma } - 1} e^{-\sigma (m-1)} \le 2 C(M,\sigma ) m e^{-\sigma (m-1)}\), where \(C(M,\sigma _n):= \frac{2M}{(e^{ \sigma } - 1)}\). We then conclude that for all \(k = 1,\dots , m\)
Let \(\mathcal {E}_{\sigma ,n} \subset \mathbb {C}^{d}\) a complex region bounded by a Bernstein ellipse such that the restriction on \(\Gamma _{d}\) is along the \(n^{th}\) dimension and form the polyellipse \(\mathcal {E}^{d}_{\sigma }:= \prod _{n=1}^{d} \mathcal {E}_{\sigma ,n}\). Suppose that \(v:\mathcal {E}^{d}_{\sigma } \rightarrow \mathbb {C}\) is analytic on \(\mathcal {E}^{d}_{\sigma }\) and let \(\tilde{M}(v):= \max _{\textbf{z}\in \mathcal {E}^{d}_{\sigma }} \vert v(\textbf{z}) |\).
Note we refer to \(\mathcal {I}^{n}_{m}\) as the Lagrange operator of order m along the \(n^{th}\) dimension, and similarly \(\mathcal {P}^{n}_{m-1}\) is the space of the span of univariate polynomials up to degree \(m-1\) along the \(n^{th}\) dimension. Form the tensor product \(\textbf{I}^{d}_{m}:= \mathcal {I}^{1}_{m} \times \dots \times \mathcal {I}^{d}_{m}\), thus \(\textbf{I}:C(\Gamma ,\mathbb {R}) \rightarrow \mathbb {P}\) where \(\mathbb {P}:= \mathcal {P}^{1}_{m-1} \times \dots \times \mathcal {P}^{d}_{m-1}\). From Theorem 2.27 in [39], we can conclude that for a finite dimension d, as \(m \rightarrow \infty \), then \(\textbf{I}^{d}_{m}[v] \rightarrow v\).
Applying equation (A2) to equation (A1), we have that
By applying Theorem 2.10 and Corollary 2.11 in [36], if \( w \ge d\) and \(p( d, w) \ge \left( \frac{2 d}{\kappa ( d)}\right) ^{ d}\), where \(\kappa ( d):= \root d \of { d!} > d/e\) (Sterling approximation), then for any \(\hat{\sigma }\in \mathbb {R}_{+}\)
where \(\textbf{k}\in {\mathbb N}^{d}_{0}\) and \(\textbf{k}:=(k_1,\dots ,k_d)\).
Following the same approach as in [36], observe that for \(0< \delta < 1\), we can obtain a bounded constant \(c_{n,\delta } \le (e\sigma \delta )^{-1}\) such that \(m_n \exp (-\sigma m_n) \le (e\sigma \delta )^{-1}\) \(\exp (-\sigma m_n (1 - \delta ))\). Set \(\hat{\sigma }:= \sigma (1 - \delta )\), and by combining equations (A3) and (A4), we have proven the following result.
Lemma 10
Suppose that \(0< \delta < 1\), \(\hat{\sigma }:= \sigma (1 - \delta )\), and \(p(d,w) \ge \left( \frac{2 d}{\kappa (d)}\right) ^{d}\) then \(\Vert (I - \mathcal {S}_{w,d}) v(\textbf{y}) \Vert _{L^{\infty }(\Gamma ^{d})} \le \frac{C(\tilde{M},\sigma )^d e^{d - \sigma (1 - \delta ) + 1} \hat{\sigma }d }{ (\sigma \delta )^{d}} \left( \frac{e^{\hat{\sigma }}}{1 - e^{-\hat{\sigma }}} \right) ^{d} \exp \left( -\frac{d}{e} \hat{\sigma }p^{\frac{1}{d}} \right) p^{\frac{d-1}{d}}.\)
Remark 2
The restriction \(p(d,w) \ge \left( \frac{2 d}{\kappa (d)}\right) ^{d}\) is not strict and can be relaxed such that sub-exponential convergence is still obtained. We refer the reader to the bound of the Gamma function in Lemma 2.5 ([36]) and its application in the proofs of Theorem 2.10 and Corollary 2.11.
Appendix B: Experimental setup
-
1.
Matlab, C/C++ and MKL: The binary tree, multilevel basis construction, formation of the sparse matrix \(\tilde{\textbf{C}}_{\textbf{W}}\), estimation and prediction components are written and executed on Matlab [37]. However, the computational bottlenecks are executed by C/C++ software packages, Intel MKL [41], and the highly optimized BLAS and LAPACK packages contained in MATLAB. The C/C++ interfaces to matlab are constructed as dynamic shared libraries.
-
2.
Direct and fast summation: The matlab code estimates the computational cost between the direct and fast summation methods and chooses the most efficient approach. For the direct method, a combination of Graphic Processing Unit (GPU) and MKL intel libraries are used. For the fast summation method, the KIFMM (\(d = 3\)) c++ code is used. The KIFMM is modified to include a Hermite interpolant approximation of the Matérn covariance function, which is implemented with the intel MKL package [41] (see [16] for details).
-
3.
Dynamic shared libraries: These are produced with the GNU gcc/g++ packages. These libraries implement the Hermite interpolant with the intel MKL package (about 10 times faster than Matlab Matérn interpolant) and link the MATLAB code to the KIFMM.
-
4.
Cholesky and determinant computation: The Suite Sparse 4.2.1 package ([31,32,33,34,35]) is used for the determinant computation of the sparse matrix \(\tilde{\textbf{C}}_{\textbf{W}}(\varvec{\theta })\).
The code is tested on a single CPU (4 core Intel i7-3770 CPU @ 3.40GHz.), one Nvidia 970 GTX GPU, with Linux Ubuntu 18.04 and 32 GB memory. In addition, the Boston University Shared Computing Cluster was used to generate test data. To test the effectiveness of the multilevel solver, the following data sets are generated:
-
1.
Random n-sphere data set: The set of nested random observation \(\textbf{S}_{0}^{d} \subset \dots \subset \textbf{S}_{9}^{d}\) vary from 1000, 2000, 4000 to 256,000 knots generated on the n-sphere \(\textbf{S}_{d-1}:= \{\textbf{x}\in \mathbb {R}^{d}:\,\,\Vert \textbf{x}\Vert _{2} = 1 \}\).
-
2.
Random hypercube data set: The set of random observation locations \(\textbf{C}_{0}^{d},\) \(\dots , \textbf{C}_{10}^{d}\) vary from 1000, 2000, 4000 to 512,000 knots generated on the hypercube \([-1,1]^{d}\) for d dimensions. The observations locations are also nested, i.e., \(\textbf{C}_{0}^{d} \subset \dots \subset \textbf{C}_{10}^{d}\).
-
3.
Normal test data set The set of observations values \(\textbf{Z}^d_{0}\), \(\textbf{Z}^d_{1}\), ...\(\textbf{Z}^d_{5}\) are formed from the Gaussian random field model () for 1000, 2000, \(\dots \) 256, 000 observation locations. The data set \(\textbf{Z}^d_{n}\) is generated from the set of nodes \(\textbf{S}^{d}_{n}\), with the covariance parameters \((\nu ,\rho )\) and the corresponding set of monomials \(\mathcal {Q}^d_w\). The Boston University Shared Computing Cluster was used to generate the normal test data.
Remark 3
All the timings for the numerical tests are given in wall clock times, i.e., the actual time that is needed to solve a problem. This is to distinguish it from CPU time, which can be significantly smaller and may not accurately reflect the real-world time taken for solving a problem.
Appendix C: Proofs
Proof of proposition 1
The proof is immediate.
Proof of lemma 2
Starting at the finest level t, for each cell \(B^{t}_k \in \mathcal {B}^{t}\), there is at most p multilevel vectors. Since there is at most \(2^t\) cells, then there is at most \(2^{t} p\) multilevel vectors.
Now, for each pair of left and right (siblings) cells at level t, the parent cell at level \(t-1\) will have at most 2p scaling functions. Thus, at most p multilevel vectors and p scaling vectors are obtained that are to be used for the next level. Now, the rest of the cells at level t are leafs and will have at most p multilevel vectors and p scaling vectors that are to be used for the next level. Since there is at most \(2^{t-1}\) cells at level \(t-1\), there is at most \(2^{t-1} p\) multilevel vectors. Now, follow an inductive argument until \(q = 0\) and the proof is done.
Proof of lemma 3
For any leaf cell at the bottom of the tree (level t), there is at most 2p observations. Cell has at most 4p observations; thus, the associated multilevel vector has 4p non-zero entries. By induction at any level l, the number of non-zero entries is at most \(2^{t-q+1} p\). Now, for any leaf cell at any other level \(l < t\), the number of non-zero entries is at most 2p. Following an inductive argument, the result is obtained.
Proof of proposition 4
Let us look at the cost of computing all the interactions between any two cells \(B^{i}_k \in \mathcal {B}^{i}\) and \(B^{j}_l \in \mathcal {B}^{j}\). Without loss of generality, assume that \(i \le j\). For the cell \(B^{l}_k\), there is at most p multilevel vectors and from Lemma 3\(2^{t-i+1} p\) non-zero entries. Similarly for \(B^{j}_l\). All the interactions \((\varvec{\psi }^{i}_{\tilde{k}})^\textrm{T}\textbf{C}(\varvec{\theta }) \varvec{\psi }^{j}_{\tilde{l}}\) now have to be computed, where \(\varvec{\psi }^{i}_{\tilde{k}} \in B^{i}_k\) and \(\varvec{\psi }^{j}_{\tilde{l}} \in B^{j}_l\).
The term \(\textbf{C}(\varvec{\theta }) \varvec{\psi }^{j}_{\tilde{l}}\) is computed using a FMM with \(2^{t-j+1} p\) sources and \(2^{t-i+1} p\) targets at a cost of \(\mathcal {O}(\) \((2^{t-j+1} p + 2^{t-i+1} \tilde{p})^{\alpha })\). Since there is at most p multilevel vectors in \(B^{i}_k\) and \(B^{j}_l\), then the cost for computing all the interactions \((\varvec{\psi }^{i}_{\tilde{k}})^\textrm{T}\textbf{C}(\varvec{\theta }) \varvec{\psi }^{j}_{\tilde{l}}\) is \(\mathcal {O}(p(2^{t-j+1} p + 2^{t-i+1} p)^{\alpha } + 2^{t-i+1}p)\).
Now, at any level i, there is at most \(2^{i}\) cells; thus, the result follows.
Proof of proposition 5
A simple extension of the proof in Proposition 1.
Proof of theorem 6
Immediate.
Proof of theorem 7
We first have that
The last equality follows from \(\varvec{\psi }^{i}_m, \varvec{\psi }^{j}_{q} \in \mathcal {P}^{ p}(\mathbb {S})^{\perp }\). We now have that
Since \(\varvec{\psi }^i_m\) and \(\varvec{\psi }^j_q\) are orthonormal, then \(\sum _{k = 1}^{N} \sum _{l = 1}^{N} |\varvec{\psi }^i_m[k] ||\varvec{\psi }^j_q[l] |\le \sqrt{n_m n_q}\)\(\Vert \varvec{\psi }^i_m[k]\Vert _{l^2} \Vert \varvec{\psi }^j_q[l]\Vert _{l^2}\) \(= \sqrt{n_m n_q}\). From Lemma 10, the result follows.
Proof of theorem 8
The polynomial function is an entire function. However, the function \(K_{\nu }(\vartheta )\) and \(\vartheta ^{\frac{1}{2}}\) are analytic for all \(\vartheta \in \mathbb {C}\) except at the branch cut \((-\infty ,0]\). Thus, it is sufficient to check the analytic extension of \(r(\varvec{\theta }) = \Big ( \sum _{k=1}^{d} \theta _{k} (x_k - y_k)^{2} \Big )^{\frac{1}{2}}\). Let \(z \in \mathbb {C}\) be the complex extension of \(r \in \mathbb {R}\). More precisely, \(z = \Big ( \sum _{k=1}^{d} \theta _{k} z_k^{2} \Big )^{\frac{1}{2}}\), where \(z_k \in \mathbb {C}\) is the complex extension of \((x_k - y_k)\).
Let \(\vartheta = \sum _{k=1}^{d} \theta _{k} z_k^{2}\), then by taking the appropriate branch \({\text {Re}}\, z = r_{\vartheta }\) \(\cos {( \theta _{\vartheta }/2)}\), where \(r^2_{\vartheta } = ({\text {Re}}\, \vartheta )^2 + (\textrm{Im}\, \vartheta )^2\) and \(\theta _{\vartheta } = \tan ^{-1} \frac{\textrm{Im}\, \vartheta }{{\text {Re}}\, \vartheta }\). Due to the branch cut at \((-\infty ,0]\), we impose the restriction that \({\text {Re}}\, \vartheta > 0\) as \(x_k\) and \(y_k\) are extended in the complex plane. Consider any two cells \(B^{i}_{m} \in \mathcal {B}^i\) and \(B^{j}_{q} \in \mathcal {B}^j\), at levels i and j with the associated distance criterion constant \(\tau _{i,j}>0\). From Algorithms 4, 5, 6, and 7, for any observations \(\textbf{x}^{*} \in B^{i}_{m}\) and \(\textbf{y}^{*} \in B^{j}_{q}\), we have that \((x^*_k - y^*_k)^2 \ge \tau ^2_{i,j}\) for \(k = 1,\dots ,d\). For the rest of the discussion, it is assumed that complex extension is respect to each component \(k = 1,\dots ,d\) unless otherwise specified.
Extend \(\alpha _k \rightarrow \alpha _k + v_k\) and \(\gamma _k \rightarrow \gamma _k + w_k\) where \(v_k:= v^R_k + iv^I_k\), \(w_k:= w^R_k + iw^I_k\), and \(v^R_k,v^I_k,w^R_k,w^I_k \in \mathbb {R}\). Let \(\tilde{x}_k\) be the extension of \(x_k\) in the complex plane and similarly for \(\tilde{y}_k\). It follows that \(\tilde{x}^R_k:= {\text {Re}}\, \tilde{x}_k = \frac{1}{2} (\alpha _k + 1 + v^R_k)a_k + b_k\), \(\tilde{x}^I_k = \textrm{Im}\, \tilde{x}_k = \frac{v^I_k}{2} a_k\), \(y^R_k:= {\text {Re}}\, \tilde{y}_k = \frac{1}{2} (\gamma _k + 1 + w^R_k)c_k + d_k\), and \(y^I_k:= \textrm{Im}\, \tilde{y}_k = \frac{w^I_k}{2} c_k\). After some manipulation
Recall that \((x_k - y_k)^2 \ge \tau ^2_{i,j}\) and suppose that there is a positive constant \(\delta _{k} > 0 \) such that
Assume that \(|v^R_k |\le \tau _{i,j} \delta _{k} / a_{k}\), \(|v^I_k |\le \tau _{i,j} \delta _{k}/a_{k}\), \(|w^R_k |\le \tau _{i,j} \delta _{k} / c_{k}\), and \(|w^I_k |\le \tau _{i,j} \delta _{k} / c_{k}\). From equations (C5) and (C6), it follows that
Furthermore,
Similarly,
We now show how \(\alpha _k\) and \(\gamma _k\) can be extended into the Bernstein ellipses \(\mathcal {E}_{\sigma ^{\alpha }_k}\) and \(\mathcal {E}_{\sigma ^{\gamma }_k}\), for some \(\sigma ^{\alpha }_k > 0\) and \(\sigma ^{\gamma }_k > 0\) such that \({\text {Re}}\, z^2_k > 0\). Recall that \(|v^R_k |\le \tau _{i,j} \delta _{k} / a_{k}\), \(|v^I_k |\le \tau _{i,j} \delta _{k}/a_{k}\), \(|w^R_k |\le \tau _{i,j} \delta _{k} / c_{k}\), and \(|w^I_k |\le \tau _{i,j} \delta _{k} / c_{k}\). These restrictions form a region in \(\mathbb {C}\times \mathbb {C}\), and a Bernstein ellipse is embedded (see Fig. 6). This is done by solving the following equation: \(\frac{e^{\sigma ^{\alpha }_k} + e^{-\sigma ^{\alpha }_k}}{2} = 1 + \frac{\tau _{i,j} \delta _{k}}{a_k}\). The unique solution is \(\sigma ^{\alpha }_k = \cosh ^{-1} \left( 1 + \frac{\tau _{i,j} \delta _{k}}{a_k} \right) \) with \(\sigma ^{\alpha }_k > 0\). Following a similar argument, we have that \(\sigma ^{\gamma }_k = \cosh ^{-1} \left( 1 + \frac{\tau _{i,j} \delta _{k}}{c_k} \right) \) with \(\sigma ^{\gamma }_k > 0\). Let \(\mathcal {E}^{d}_{\alpha }:= \prod _{k=1}^{d} \mathcal {E}_{\sigma ^{\alpha }_k}\) and \(\mathcal {E}^{d}_{\gamma }:= \prod _{k=1}^{d} \mathcal {E}_{\sigma ^{\gamma }_k}\). It follows that
Thus, there exist an analytic extension of \(\phi (r;\varvec{\theta }):\Gamma ^d \times \Gamma ^d \rightarrow \mathbb {R}\) on \(\mathcal {E}^{d}_{\alpha } \times \mathcal {E}^{d}_{\gamma }\).
The next step is to bound the absolute value of the Matérn covariance function \( |\phi (z;\varvec{\theta }) |\) in \(\mathcal {E}^{d}_{\alpha } \times \mathcal {E}^{d}_{\gamma }\). If \(\nu > \frac{1}{2}\) and \({\text {Re}}\, z>0\), then the modified Bessel function of the second kind satisfies the following identity: \(K_{\nu }(\sqrt{2 \nu }z) = \frac{\sqrt{\pi } (\sqrt{2 \nu }z)^{\nu }}{2^\nu \Gamma (\nu + \frac{1}{2})} \int _{1}^{\infty } (t^2 - 1)^{\nu - \frac{1}{2}} \exp {(-\sqrt{2 \nu }zt)}\, \text {d}t.\) It is not hard to show that for \(\nu > \frac{1}{2}\) and \({\text {Re}}\, z >0\), we have that \( |K_{\nu }(\sqrt{2 \nu }z) |\le \frac{ |\sqrt{2 \nu }z |^{\nu }}{({\text {Re}}\, \sqrt{2 \nu }z)^{\nu }}\) \(K_{\nu }(\sqrt{2 \nu } {\text {Re}}\, z)\). Note that \(r_{\vartheta } \ge {\text {Re}}\, \vartheta > 0\). From equation (C9), we have that \(\textrm{Im}\, \vartheta = \sum _{k=1}^{d} \theta _k\, \text {Im}\, z^2_k \le \sum _{k=1}^{d} 2 \tau \delta _{k} + 4 \tau ^2 \delta _{k}^2\). From equation (C10) \( |\theta _{\vartheta } |\le \xi (\varvec{\theta },\varvec{\delta },\tau _{i,j}) \).
Since \(K_{\nu }(\cdot )\) is strictly completely monotonic [42], then
and therefore,
By combining equations (C9), (C10), (C11), and (C12), we have now proven the Theorem.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Castrillón-Candás, J.E. Spatial best linear unbiased prediction: a computational mathematics approach for high dimensional massive datasets. Adv Comput Math 50, 37 (2024). https://doi.org/10.1007/s10444-024-10132-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10444-024-10132-9
Keywords
- Hierarchical basis
- Best linear unbiased prediction
- High performance computing
- Uncertainty quantification