Skip to main content
Log in

Spatial best linear unbiased prediction: a computational mathematics approach for high dimensional massive datasets

  • Published:
Advances in Computational Mathematics Aims and scope Submit manuscript

Abstract

With the advent of massive data sets, much of the computational science and engineering community has moved toward data-intensive approaches in regression and classification. However, these present significant challenges due to increasing size, complexity, and dimensionality of the problems. In particular, covariance matrices in many cases are numerically unstable, and linear algebra shows that often such matrices cannot be inverted accurately on a finite precision computer. A common ad hoc approach to stabilizing a matrix is application of a so-called nugget. However, this can change the model and introduce error to the original solution. It is well known from numerical analysis that ill-conditioned matrices cannot be accurately inverted. In this paper, we develop a multilevel computational method that scales well with the number of observations and dimensions. A multilevel basis is constructed adapted to a kd-tree partitioning of the observations. Numerically unstable covariance matrices with large condition numbers can be transformed into well-conditioned multilevel ones without compromising accuracy. Moreover, it is shown that the multilevel prediction exactly solves the best linear unbiased predictor (BLUP) and generalized least squares (GLS) model, but is numerically stable. The multilevel method is tested on numerically unstable problems of up to 25 dimensions. Numerical results show speedups of up to 42,050 times for solving the BLUP problem, but with the same accuracy as the traditional iterative approach. For very ill-conditioned cases, the speedup is infinite. In addition, decay estimates of the multilevel covariance matrices are derived based on high dimensional interpolation techniques from the field of numerical analysis. This work lies at the intersection of statistics, uncertainty quantification, high performance computing, and computational applied mathematics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cressie, N.A., Johannesson, G.: Spatial prediction for massive datasets. Faculty of Engineering and Information Sciences (2006). Papers: Part A. 5976

  2. Cressie, N.: Statistics for spatial data. John Wiley & Sons, Incorporated, New York, UNITED STATES (1993)

    Book  Google Scholar 

  3. Henderson, C.R.: Estimation of genetic parameters. Ann. Math. Stat. 21(2), 309–310 (1950)

    Google Scholar 

  4. Henderson, C.R.: Best linear unbiased estimation and prediction under a selection model. Biometrics 31(2), 423–447 (1975)

    Article  MathSciNet  Google Scholar 

  5. Liu, X.: Methods and applications of longitudinal data analysis. Academic Press, Oxford (2016)

    Google Scholar 

  6. Liu, X.-Q., Rong, J.-Y., Liu, X.-Y.: Best linear unbiased prediction for linear combinations in general mixed linear models. J. Multivar. Anal. 99(8), 1503–1517 (2008)

    Article  MathSciNet  Google Scholar 

  7. Stein, M.L.: Interpolation of spatial data : some theory for kriging. Springer series in statistics. Springer, New York (1999)

  8. Golub, G.H., Van Loan, C.F.: Matrix Computations. The Johns Hopkins University Press, third edition (1996)

  9. Minden, V., Damle, A., Ho, K.L., Ying, L.: Fast spatial gaussian process maximum likelihood estimation via skeletonization factorizations. arXiv:1603.08057v3 (2016)

  10. Nowak, W., Litvinenko, A.: Kriging and spatial design accelerated by orders of magnitude: combining low-rank covariance approximations with FFT-techniques. Math. Geosci. 45(4), 411–435 (2013)

    Article  MathSciNet  Google Scholar 

  11. Khoromskij, B.N., Litvinenko, A., Matthies, H.G.: Application of hierarchical matrices for computing the Karhunen–Loève expansion. Computing 84(1–2), 49–67 (2009)

    Article  MathSciNet  Google Scholar 

  12. Litvinenko, A., Sun, Y., Genton, M.G., Keyes, D.E.: Likelihood approximation with hierarchical matrices for large spatial datasets. Comput. Stat. Data Anal. 137, 115–132 (2019)

    Article  MathSciNet  Google Scholar 

  13. Geoga, C.J., Anitescu, M., Stein, M.L.: Scalable Gaussian process computations using hierarchical matrices. J. Comput. Graph. Stat. 29(2), 227–237 (2020)

    Article  MathSciNet  Google Scholar 

  14. Schäfer, F., Sullivan, T.J., Owhadi, H.: Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity. arXiv:1706.02205 (2020)

  15. Schäfer, F., Katzfuss, M., Owhadi, H.: Sparse Cholesky factorization by Kullback–Leibler minimization. SIAM J. Sci. Comput. 43(3), 2019–2046 (2021)

    Article  MathSciNet  Google Scholar 

  16. Castrillón-Candás, J.E., Genton, M.G., Yokota, R.: Multi-level restricted maximum likelihood covariance estimation and kriging for large non-gridded spatial datasets. Spatial Stat. 18 Part A, 105–124 (2016). Spatial Statistics Avignon: Emerging Patterns

  17. Castrillón-Candás, J.E., Li, J., Eijkhout, V.: A discrete adapted hierarchical basis solver for radial basis function interpolation. BIT Numeric. Math. 53(1), 57–86 (2013)

    Article  MathSciNet  Google Scholar 

  18. Harbrecht, H., Multerer, M.: Samplets: construction and scattered data compression. J. Comput. Phys. 471, 111616 (2022)

  19. Beylkin, G., Coifman, R., Rokhlin, V.: Fast wavelet transforms and numerical algorithms I. Commun. Pure Appl. Math. 44(2), 141–183 (1991)

    Article  MathSciNet  Google Scholar 

  20. Nobile, F., Tempone, R., Webster, C.: A sparse grid stochastic collocation method for partial differential equations with random input data. SIAM J. Numeric. Anal. 46(5), 2309–2345 (2008)

    Article  MathSciNet  Google Scholar 

  21. Nobile, F., Tempone, R., Webster, C.: An anisotropic sparse grid stochastic collocation method for partial differential equations with random input data. SIAM J. Numeric. Anal. 46(5), 2411–2442 (2008)

    Article  MathSciNet  Google Scholar 

  22. Castrillon-Candas, J.E., Nobile, F., Tempone, R.: Analytic regularity and collocation approximation for pdes with random domain deformations. Comput. Math. App. 71(6), 1173–1197 (2016)

    MathSciNet  Google Scholar 

  23. Castrillón-Candás, J.E., Xu, J.: A stochastic collocation approach for parabolic PDEs with random domain deformations. Comput. Math. App. 93, 32–49 (2021)

    MathSciNet  Google Scholar 

  24. Castrillón-Candás, J.E., Nobile, F., Tempone, R.F.: A hybrid collocation-perturbation approach for PDEs with random domains. Adv. Comput. Math. 47(3), 40 (2021)

    Article  MathSciNet  Google Scholar 

  25. Li, W., Wang, X., Sun, Y., Milanovic, S., Kon, M., Castrillón-Candás, J.E.: Multilevel stochastic optimization for imputation in massive medical data records. IEEE Trans. Big Data (2023). To Appear

  26. Abramowitz, M., Stegun, I.A.: Handbook of mathematical functions: with formulas, graphs, and mathematical tables. Applied mathematics series. Dover Publications, Mineola, New York, USA (1964)

    Google Scholar 

  27. Dasgupta, S., Freund, Y.: Random projection trees and low dimensional manifolds. In: Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing. STOC ’08, pp. 537–546. ACM, New York, NY, USA (2008)

  28. Berg, M.d., Cheong, O., Kreveld, M.v., Overmars, M.: Computational geometry: algorithms and applications, 3rd ed. edn. Springer, Santa Clara, CA, USA (2008)

  29. Ying, L., Biros, G., Zorin, D.: A kernel-independent adaptive fast multipole method in two and three dimensions. J. Comput. Phys. 196(2), 591–626 (2004)

    Article  MathSciNet  Google Scholar 

  30. Nielsen, H.B., Lophaven, S.N., Søndergaard, J.: DACE - a Matlab kriging toolbox. Informatics and Mathematical Modelling, Technical University of Denmark, DTU, Richard Petersens Plads, Building 321, DK-2800 Kgs. Lyngby (2002)

  31. Chen, Y., Davis, T.A., Hager, W.W., Rajamanickam, S.: Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate. ACM Trans. Math. Softw. 35(3), 22–12214 (2008)

    Article  MathSciNet  Google Scholar 

  32. Davis, T.A., Hager, W.W.: Dynamic supernodes in sparse Cholesky update/downdate and triangular solves. ACM Trans. Math. Softw. 35(4), 27–12723 (2009)

    Article  MathSciNet  Google Scholar 

  33. Davis, T., Hager, W.: Row modifications of a sparse Cholesky factorization. SIAM J. Matrix Anal. App. 26(3), 621–639 (2005)

    Article  MathSciNet  Google Scholar 

  34. Davis, T., Hager, W.: Multiple-rank modifications of a sparse Cholesky factorization. SIAM J. Matrix Anal. App. 22(4), 997–1013 (2001)

    Article  MathSciNet  Google Scholar 

  35. Davis, T., Hager, W.: Modifying a sparse Cholesky factorization. SIAM J. Matrix Anal. App. 20(3), 606–627 (1999)

    Article  MathSciNet  Google Scholar 

  36. Griebel, M., Oettershagen, J.: On tensor product approximation of analytic functions. J. Approx. Theory 207, 348–379 (2016)

    Article  MathSciNet  Google Scholar 

  37. MATLAB: R2019a. The MathWorks Inc., Natick, Massachusetts (2019)

  38. Bäck, J., Nobile, F., Tamellini, L., Tempone, R.: Stochastic spectral Galerkin and collocation methods for PDEs with random coefficients: a numerical comparison. In: Hesthaven, J.S., Rønquist, E.M. (eds.) Spectral and high order methods for partial differential equations. Lecture Notes in Computational Science and Engineering, vol. 76, pp. 43–62. Springer, Heidelberg, Germany (2011)

    Chapter  Google Scholar 

  39. Khoromskij, B.N.: Tensor numerical methods in scientific computing. De Gruyter Inc, Berlin/Boston, UNITED STATES (2018)

    Book  Google Scholar 

  40. Babuska, I., Nobile, F., Tempone, R.: A stochastic collocation method for elliptic partial differential equations with random input data. SIAM Rev. 52(2), 317–355 (2010)

    Article  MathSciNet  Google Scholar 

  41. Intel Math Kernel Library: Intel, (2018). http://software.intel.com/en-us/articles/intel-mkl/

  42. Baricz, A., Ponnusamy, S., Vuorinen, M.: Functional inequalities for modified Bessel functions. Expo. Math. 29(4), 399–414 (2011)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

I appreciate the help and advice from George Biros and Lexing Ying for setting up the KIFMM packages. In addition, I am grateful to the intense and very illuminating discussions with Michael Stein and Mihai Anitesc. I also appreciate the support that King Abdullah University of Science and Technology has provided to this project.

Funding

This material is based upon work supported by the National Science Foundation under Grant No. 1736392 and No. 2319011.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julio Enrique Castrillón-Candás.

Ethics declarations

Conflict of interest

The author declares no competing interests.

Additional information

Communicated by: Anthony Nouy

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Polynomial interpolation

In this section, we provide some background on polynomial interpolation in high dimensions. This will be critical to estimate the decay rates of the entries of the multilevel covariance matrix for high dimensional problems.

The decay of the coefficients will directly depend on the analytic properties of the covariance function. The traditional error estimates of polynomial interpolation are based on multi-variate \(m^{th}\) order derivatives. However, for many cases, such as the Matérn covariance function, the derivatives are too complex or expensive to manipulate for even a moderate number of dimensions. This motivates the study of polynomial numerical approximations based on complex analytic extensions, which are much better suited for high dimensions. Much of the discussion that follows has it roots in the field of uncertainty quantification and high dimensional interpolation [20, 22, 36] for partial differential equations.

Consider the problem of approximating a function \(v: \Gamma ^{d} \rightarrow \mathbb {R}\) on the domain \(\Gamma ^{d}\). Without loss of generality, let \(\Gamma \equiv [-1, 1]\) and \(\Gamma ^{d} \equiv \prod _{n = 1}^{d} \Gamma \). Suppose that \(\mathcal {G}\subset \Gamma ^{d}\), then define the following spaces

$$\begin{aligned} {\begin{matrix}&L^q(\mathcal {G}) := \{ v(\textbf{y})\, : \, \int _{\mathcal {G}} v(\textbf{y})^q \text {d} \textbf{y}< \infty \} \,\,\, \text{ and } \,\,\,\, L^{\infty }(\mathcal {G}) := \{ v(\textbf{y})\, : \, \sup _{\textbf{y}\in \mathcal {G}} |v(\textbf{y}) |< \infty \}. \end{matrix}} \end{aligned}$$

Suppose that \(\mathcal {P}_{ q}(\Gamma ):= \text {span}\{y^k,\,k=0,\dots ,q\}\), i.e., the space of polynomials of degree at most q. Let \(\mathcal {I}^{m}: C^{0}(\Gamma ) \rightarrow \mathcal {P}_{m-1}(\Gamma )\) be the univariate Lagrange interpolant \(\mathcal {I}_{m}(v(\textbf{y})):= \sum _{k=1}^{m}v(y^{(k)})l_{m,k}(y^{(k)})\), where \(y^{(1)}, \dots , y^{(m)}\) is a set of distinct knots on \(\Gamma \) and \(\{ l_{n,k} \}_{k=0}^{m}\) is a Lagrange basis of the space \(\mathcal {P}_{m-1}(\Gamma )\). The variable \(m \in \mathbb {N}_0\) corresponds to the order of approximation of the Lagrange interpolant. However, for the case of the zero order interpolation, \(m = 0\) corresponds to \(\mathcal {I}_{0} = 0\).

Remark 1

For high dimensional interpolation, the particular set of points \(y^{(1)}, \dots ,\) \(y^{(m)}\) that we will use is the Clenshaw-Curtis abscissas. This is further discussed in this section. However, for now, we assume that the points are only distinct.

For \(m \ge 1\), let \(\Delta _{m}:= \mathcal {I}_{m}-\mathcal {I}_{m-1}\). From the difference operator \(\Delta _{m}\), we can readily observe that \(\mathcal {I}_{m} = \sum _{k=1}^{m} \Delta _{k}\), which is reminiscent of multi resolution wavelet decompositions. The idea is to represent multivariate approximation as a summation of the difference operators.

Consider the multi-index tupple \(\textbf{m}= (m_1,\dots ,m_d)\), where \(\textbf{m}\in \mathbb {N}_0^{d}\), and form the tensor product operator \(\mathcal {S}_{w,d}: \Gamma \rightarrow \mathbb {R}\) as

$$\begin{aligned} \mathcal {S}_{w,d} [v(\textbf{y})] : = \sum _{\textbf{m}\in {\mathbb N}^{d}: \sum _{i=1}^{d} m_i - 1 \le w } \;\; \bigotimes _{n=1}^{d} {\Delta ^{n}_{m_n}}(v(\textbf{y})). \end{aligned}$$
(A1)

Note that by \({\Delta ^{n}_{m_n}}(v(\textbf{y}))\), we mean that the difference operator \({\Delta _{m_n}}\) is applied along the \(n^{th}\) dimension in \(\Gamma \).

Let \(C^{0}(\Gamma ^d; \mathbb {R}): = \{ v: \Gamma ^d \rightarrow \mathbb {R}\,\,\) is continuous on \(\Gamma ^d\) and \(\max _{\textbf{y}\in \Gamma ^d} |v(\textbf{y}) |< \infty \}\). From Proposition 1 in [38], it is shown that for any \(v \in C^0(\Gamma ^d;\mathbb {R})\), we have \(\mathcal {S}_{w,d}[v]\in \mathcal {Q}^{d}_{w}\). Moreover, \(\mathcal {S}_{w,d}[v] = v\), for all \(v \in \mathcal {Q}^{d}_{w}\). The key observation to take away is that the operator \(\mathcal {S}_{w,d}[v]\) is exact in the space of polynomials \(\mathcal {Q}^{d}_{w}\). This will be useful in connecting the Lagrange interpolant with Chebyshev polynomials.

Let \(T_k:\Gamma \rightarrow \mathbb {R}\), \(k = 0, 1, \dots \), be a Chebyshev polynomial over \(\Gamma \), which are defined recursively as follows: \(T_0(y) = 1\), \(T_1(y) = y\), \(\dots \), \(T_{k+1}(y) = 2yT_{k}(y) - T_{k-1}(y)\), \(\dots \), where \(y \in \Gamma \). Chebyshev polynomials are well suited for the approximation of functions with analytic extensions on a complex region bounded by a Bernstein ellipse. They bypassing the need of using derivative information and sharp bounds on the error are readily available. Suppose that \(\sigma > 0\) and denote by

$$\begin{aligned} \mathcal {E}_{\sigma }:= & {} \Big \{ z \in \mathbb {C}, \sigma \ge \delta \ge 0:\,\text {Re}\,{z} = \frac{e^{\delta } + e^{-\delta } }{2}cos(\theta ), \text {Im}\,{z} = \frac{e^{\delta } - e^{-\delta }}{2}sin(\theta ), \\{} & {} \theta \in [0,2\pi ) \Big \} \end{aligned}$$

as the region bounded by a Bernstein ellipse (see Fig. 5). The following theorem is based on complex analytic extensions on \(\mathcal {E}_{\sigma }\) and provides a control for the Chebyshev polynomial approximation.

Theorem 6

Suppose that for \(u:\Gamma \rightarrow \mathbb {R}\), there exists an analytic extension on \(\mathcal {E}_{\sigma }\). If u \(\le M < \infty \) on \(\mathcal {E}_{\sigma }\), then there exists a sequence of coefficients \(|\alpha _k|\le M / e^{k\sigma }\) such that \(u \equiv \alpha _0 + 2\sum _{k = 1}^{\infty } \alpha _{k} T_{k}\) on \(\mathcal {E}_{\sigma }\). Moreover, if \(y \in \Gamma \), then \(\vert q(y) - \alpha _0 - 2\sum _{k = 1}^{n} \alpha _{k} T_{k}(y) |\le \frac{2\,M}{e^{\sigma } - 1} e^{-n \sigma }\).

Proof

See Theorem 2.25 in [39] \(\square \)

Fig. 5
figure 5

Complex region bounded by the Bernstein ellipse

We can now connect the error due to the Lagrange interpolation with Chebyshev expansions. It is known that if \(u \in C(\Gamma ,\mathbb {R})\), then \(\Vert (I - \mathcal {I}_{m})u\Vert _{L^{\infty }(\Gamma )} \le (1 + \Lambda _{m}) \min _{h \in \mathcal {P}_{m-1}} \Vert u - h \Vert _{L^{\infty }(\Gamma )}\), where \(\Lambda _{m}\) is the Lebesgue constant (See Lemma 7 in [40]). Note that \(I:C^{d}(\Xi ;\mathbb {R}) \rightarrow C^{d}(\Xi ;\mathbb {R})\) refers to the identity operator and the domain \(\Xi \) is taken from context. For the previous case, \(\Xi = \Gamma \). Bounds on \(\Lambda _{m}\) are known in the context of the location of the knots \(y^{(1)}, \dots , y^{(m)} \in \Gamma \). In this article, we restrict our attention to Clenshaw-Curtis abscissas \(y^{(j)} = -\cos \left( \frac{\pi (j-1)}{m - 1} \right) ,\,\, j = 1,\dots , m\) and \(\Lambda _m\) is bounded by \(2\pi ^{-1}(\log {(m-1)} + 1) \le 2m - 1\) (see [40]). Since the interpolation operator \(\mathcal {I}_{m}\) is exact on \(\mathcal {P}_{m - 1}\), then if \(u:\Gamma \rightarrow \mathbb {R}\) has an analytic extension in \(\mathcal {E}_{\sigma }\), we have from Theorem 9 (following a similar approach as in [40]) that \(\Vert (I - \mathcal {I}_{m})u\Vert _{L^{\infty }(\Gamma _n)} \le (1 + \Lambda _{m}) \frac{2\,M}{e^{\sigma } - 1} e^{-\sigma (m-1)} \le 2 C(M,\sigma ) m e^{-\sigma (m-1)}\), where \(C(M,\sigma _n):= \frac{2M}{(e^{ \sigma } - 1)}\). We then conclude that for all \(k = 1,\dots , m\)

$$\begin{aligned} {\begin{matrix} \Vert \Delta _{k}(u) \Vert _{L^{\infty }(\Gamma )} &{}= \Vert \mathcal {I}^{m}(u) - \mathcal {I}^{m-1}(u) \Vert _{L^{\infty }(\Gamma )} \le \Vert (I - \mathcal {I}_{m})u\Vert _{L^{\infty }(\Gamma )} \\ &{}+ \Vert (I - \mathcal {I}_{m-1})u\Vert _{L^{\infty }(\Gamma )} \le e^{2\sigma }C(M,\sigma ) m e^{-\sigma m}. \end{matrix}} \end{aligned}$$
(A2)

Let \(\mathcal {E}_{\sigma ,n} \subset \mathbb {C}^{d}\) a complex region bounded by a Bernstein ellipse such that the restriction on \(\Gamma _{d}\) is along the \(n^{th}\) dimension and form the polyellipse \(\mathcal {E}^{d}_{\sigma }:= \prod _{n=1}^{d} \mathcal {E}_{\sigma ,n}\). Suppose that \(v:\mathcal {E}^{d}_{\sigma } \rightarrow \mathbb {C}\) is analytic on \(\mathcal {E}^{d}_{\sigma }\) and let \(\tilde{M}(v):= \max _{\textbf{z}\in \mathcal {E}^{d}_{\sigma }} \vert v(\textbf{z}) |\).

Note we refer to \(\mathcal {I}^{n}_{m}\) as the Lagrange operator of order m along the \(n^{th}\) dimension, and similarly \(\mathcal {P}^{n}_{m-1}\) is the space of the span of univariate polynomials up to degree \(m-1\) along the \(n^{th}\) dimension. Form the tensor product \(\textbf{I}^{d}_{m}:= \mathcal {I}^{1}_{m} \times \dots \times \mathcal {I}^{d}_{m}\), thus \(\textbf{I}:C(\Gamma ,\mathbb {R}) \rightarrow \mathbb {P}\) where \(\mathbb {P}:= \mathcal {P}^{1}_{m-1} \times \dots \times \mathcal {P}^{d}_{m-1}\). From Theorem 2.27 in [39], we can conclude that for a finite dimension d, as \(m \rightarrow \infty \), then \(\textbf{I}^{d}_{m}[v] \rightarrow v\).

Applying equation (A2) to equation (A1), we have that

$$\begin{aligned} {\begin{matrix} &{} \Vert (I - \mathcal {S}_{w,d}) v(\textbf{y}) \Vert _{L^{\infty }(\Gamma ^{d})} \le \left\| \sum _{\textbf{m}\in {\mathbb N}^{d}: \sum _{i=1}^{d} m_i - 1> w } \;\; \bigotimes _{n=1}^{d} {\Delta ^{n}_{m_n}}(v(\textbf{y}))\right\| _{L^{\infty }(\Gamma ^d)} \\ &{}\le \sum _{\textbf{m}\in {\mathbb N}^{d}: \sum _{i=1}^{d} m_i - 1> w } \bigotimes _{n=1}^{d} \Vert {\Delta ^{n}_{m_n}}(v(\textbf{y}))\Vert _{L^{\infty }(\Gamma ^d)} \\ &{}\le \sum _{\textbf{m}\in {\mathbb N}^{d}: \sum _{i=1}^{d} m_i - 1 > w } e^{2d} C(M,\sigma )^{d} \left( \prod _{n=1}^{d} m_n\right) \exp {\left( -\sum _{n=1}^{d} \sigma m_{n} \right) }. \end{matrix}} \end{aligned}$$
(A3)

By applying Theorem 2.10 and Corollary 2.11 in [36], if \( w \ge d\) and \(p( d, w) \ge \left( \frac{2 d}{\kappa ( d)}\right) ^{ d}\), where \(\kappa ( d):= \root d \of { d!} > d/e\) (Sterling approximation), then for any \(\hat{\sigma }\in \mathbb {R}_{+}\)

$$\begin{aligned} {\begin{matrix} &{} \sum _{\textbf{k}\in {\mathbb N}^{ d}_{0}: \sum _{i=1}^{ d} k_i > w } \exp {\left( -\sum _{n=1}^{ d} \hat{\sigma }k_{n} \right) } \le \sum _{\textbf{k}\in {\mathbb N}^{d}_{0}: \hat{\sigma }\sum _{i=1}^{ d} k_i \ge w \hat{\sigma }} \exp {\left( -\sum _{n=1}^{ d} \hat{\sigma }k_{n} \right) } \\ &{}\le \hat{\sigma }d e \left( \frac{e^{\hat{\sigma }}}{1 - e^{-\hat{\sigma }}} \right) ^{ d} \exp \left( -\frac{ d}{e} \hat{\sigma }p^{\frac{1}{ d}} \right) p^{\frac{ d-1}{ d}}. \end{matrix}} \end{aligned}$$
(A4)

where \(\textbf{k}\in {\mathbb N}^{d}_{0}\) and \(\textbf{k}:=(k_1,\dots ,k_d)\).

Following the same approach as in [36], observe that for \(0< \delta < 1\), we can obtain a bounded constant \(c_{n,\delta } \le (e\sigma \delta )^{-1}\) such that \(m_n \exp (-\sigma m_n) \le (e\sigma \delta )^{-1}\) \(\exp (-\sigma m_n (1 - \delta ))\). Set \(\hat{\sigma }:= \sigma (1 - \delta )\), and by combining equations (A3) and (A4), we have proven the following result.

Lemma 10

Suppose that \(0< \delta < 1\), \(\hat{\sigma }:= \sigma (1 - \delta )\), and \(p(d,w) \ge \left( \frac{2 d}{\kappa (d)}\right) ^{d}\) then \(\Vert (I - \mathcal {S}_{w,d}) v(\textbf{y}) \Vert _{L^{\infty }(\Gamma ^{d})} \le \frac{C(\tilde{M},\sigma )^d e^{d - \sigma (1 - \delta ) + 1} \hat{\sigma }d }{ (\sigma \delta )^{d}} \left( \frac{e^{\hat{\sigma }}}{1 - e^{-\hat{\sigma }}} \right) ^{d} \exp \left( -\frac{d}{e} \hat{\sigma }p^{\frac{1}{d}} \right) p^{\frac{d-1}{d}}.\)

Remark 2

The restriction \(p(d,w) \ge \left( \frac{2 d}{\kappa (d)}\right) ^{d}\) is not strict and can be relaxed such that sub-exponential convergence is still obtained. We refer the reader to the bound of the Gamma function in Lemma 2.5 ([36]) and its application in the proofs of Theorem 2.10 and Corollary 2.11.

Appendix B: Experimental setup

  1. 1.

    Matlab, C/C++ and MKL: The binary tree, multilevel basis construction, formation of the sparse matrix \(\tilde{\textbf{C}}_{\textbf{W}}\), estimation and prediction components are written and executed on Matlab [37]. However, the computational bottlenecks are executed by C/C++ software packages, Intel MKL [41], and the highly optimized BLAS and LAPACK packages contained in MATLAB. The C/C++ interfaces to matlab are constructed as dynamic shared libraries.

  2. 2.

    Direct and fast summation: The matlab code estimates the computational cost between the direct and fast summation methods and chooses the most efficient approach. For the direct method, a combination of Graphic Processing Unit (GPU) and MKL intel libraries are used. For the fast summation method, the KIFMM (\(d = 3\)) c++ code is used. The KIFMM is modified to include a Hermite interpolant approximation of the Matérn covariance function, which is implemented with the intel MKL package [41] (see [16] for details).

  3. 3.

    Dynamic shared libraries: These are produced with the GNU gcc/g++ packages. These libraries implement the Hermite interpolant with the intel MKL package (about 10 times faster than Matlab Matérn interpolant) and link the MATLAB code to the KIFMM.

  4. 4.

    Cholesky and determinant computation: The Suite Sparse 4.2.1 package ([31,32,33,34,35]) is used for the determinant computation of the sparse matrix \(\tilde{\textbf{C}}_{\textbf{W}}(\varvec{\theta })\).

The code is tested on a single CPU (4 core Intel i7-3770 CPU @ 3.40GHz.), one Nvidia 970 GTX GPU, with Linux Ubuntu 18.04 and 32 GB memory. In addition, the Boston University Shared Computing Cluster was used to generate test data. To test the effectiveness of the multilevel solver, the following data sets are generated:

  1. 1.

    Random n-sphere data set: The set of nested random observation \(\textbf{S}_{0}^{d} \subset \dots \subset \textbf{S}_{9}^{d}\) vary from 1000, 2000, 4000 to 256,000 knots generated on the n-sphere \(\textbf{S}_{d-1}:= \{\textbf{x}\in \mathbb {R}^{d}:\,\,\Vert \textbf{x}\Vert _{2} = 1 \}\).

  2. 2.

    Random hypercube data set: The set of random observation locations \(\textbf{C}_{0}^{d},\) \(\dots , \textbf{C}_{10}^{d}\) vary from 1000, 2000, 4000 to 512,000 knots generated on the hypercube \([-1,1]^{d}\) for d dimensions. The observations locations are also nested, i.e., \(\textbf{C}_{0}^{d} \subset \dots \subset \textbf{C}_{10}^{d}\).

  3. 3.

    Normal test data set The set of observations values \(\textbf{Z}^d_{0}\), \(\textbf{Z}^d_{1}\), ...\(\textbf{Z}^d_{5}\) are formed from the Gaussian random field model () for 1000, 2000, \(\dots \) 256, 000 observation locations. The data set \(\textbf{Z}^d_{n}\) is generated from the set of nodes \(\textbf{S}^{d}_{n}\), with the covariance parameters \((\nu ,\rho )\) and the corresponding set of monomials \(\mathcal {Q}^d_w\). The Boston University Shared Computing Cluster was used to generate the normal test data.

Remark 3

All the timings for the numerical tests are given in wall clock times, i.e., the actual time that is needed to solve a problem. This is to distinguish it from CPU time, which can be significantly smaller and may not accurately reflect the real-world time taken for solving a problem.

Appendix C: Proofs

Proof of proposition 1

The proof is immediate.

Proof of lemma 2

Starting at the finest level t, for each cell \(B^{t}_k \in \mathcal {B}^{t}\), there is at most p multilevel vectors. Since there is at most \(2^t\) cells, then there is at most \(2^{t} p\) multilevel vectors.

Now, for each pair of left and right (siblings) cells at level t, the parent cell at level \(t-1\) will have at most 2p scaling functions. Thus, at most p multilevel vectors and p scaling vectors are obtained that are to be used for the next level. Now, the rest of the cells at level t are leafs and will have at most p multilevel vectors and p scaling vectors that are to be used for the next level. Since there is at most \(2^{t-1}\) cells at level \(t-1\), there is at most \(2^{t-1} p\) multilevel vectors. Now, follow an inductive argument until \(q = 0\) and the proof is done.

Proof of lemma 3

For any leaf cell at the bottom of the tree (level t), there is at most 2p observations. Cell has at most 4p observations; thus, the associated multilevel vector has 4p non-zero entries. By induction at any level l, the number of non-zero entries is at most \(2^{t-q+1} p\). Now, for any leaf cell at any other level \(l < t\), the number of non-zero entries is at most 2p. Following an inductive argument, the result is obtained.

Proof of proposition 4

Let us look at the cost of computing all the interactions between any two cells \(B^{i}_k \in \mathcal {B}^{i}\) and \(B^{j}_l \in \mathcal {B}^{j}\). Without loss of generality, assume that \(i \le j\). For the cell \(B^{l}_k\), there is at most p multilevel vectors and from Lemma 3\(2^{t-i+1} p\) non-zero entries. Similarly for \(B^{j}_l\). All the interactions \((\varvec{\psi }^{i}_{\tilde{k}})^\textrm{T}\textbf{C}(\varvec{\theta }) \varvec{\psi }^{j}_{\tilde{l}}\) now have to be computed, where \(\varvec{\psi }^{i}_{\tilde{k}} \in B^{i}_k\) and \(\varvec{\psi }^{j}_{\tilde{l}} \in B^{j}_l\).

The term \(\textbf{C}(\varvec{\theta }) \varvec{\psi }^{j}_{\tilde{l}}\) is computed using a FMM with \(2^{t-j+1} p\) sources and \(2^{t-i+1} p\) targets at a cost of \(\mathcal {O}(\) \((2^{t-j+1} p + 2^{t-i+1} \tilde{p})^{\alpha })\). Since there is at most p multilevel vectors in \(B^{i}_k\) and \(B^{j}_l\), then the cost for computing all the interactions \((\varvec{\psi }^{i}_{\tilde{k}})^\textrm{T}\textbf{C}(\varvec{\theta }) \varvec{\psi }^{j}_{\tilde{l}}\) is \(\mathcal {O}(p(2^{t-j+1} p + 2^{t-i+1} p)^{\alpha } + 2^{t-i+1}p)\).

Now, at any level i, there is at most \(2^{i}\) cells; thus, the result follows.

Proof of proposition 5

A simple extension of the proof in Proposition 1.

Proof of theorem 6

Immediate.

Proof of theorem 7

We first have that

$$\begin{aligned} {\begin{matrix} &{}\sum _{k = 1}^{N} \sum _{l = 1}^{N} \phi (\textbf{x}_k,\textbf{y}_l; \varvec{\theta }) \varvec{\psi }^i_m[k] \varvec{\psi }^j_q[l] = \sum _{k = 1}^{N} \sum _{l = 1}^{N} \lim _{g \rightarrow \infty } (\textbf{I}^d_{g} \otimes \textbf{I}^d_{g})[ \phi (\textbf{x}_k,\textbf{y}_l; \varvec{\theta })] \varvec{\psi }^i_m[k] \varvec{\psi }^j_q[l] \\ &{}= \sum _{k = 1}^{N} \sum _{l = 1}^{N} (I_d - \mathcal {S}_{w,d}) \otimes (I_d - \mathcal {S}_{w,d}) [\phi (\textbf{x}_k,\textbf{y}_l; \varvec{\theta })] \varvec{\psi }^i_m[k] \varvec{\psi }^j_q[l]. \end{matrix}} \end{aligned}$$

The last equality follows from \(\varvec{\psi }^{i}_m, \varvec{\psi }^{j}_{q} \in \mathcal {P}^{ p}(\mathbb {S})^{\perp }\). We now have that

$$\begin{aligned} {\begin{matrix} &{} \sum _{k = 1}^{N} \sum _{l = 1}^{N} \Vert (I_d - \mathcal {S}_{w,d}) \otimes (I_d - \mathcal {S}_{w,d}) [\phi (\textbf{x}_k,\textbf{y}_l; \varvec{\theta })] \Vert _{L^{\infty }_{\rho }(\Gamma ^d)} |\varvec{\psi }^i_m[k] ||\varvec{\psi }^j_q[l] |\\ &{}\le \Vert (I_d - \mathcal {S}_{w,d}))[\phi (\textbf{x}_k,\textbf{y}_l; \varvec{\theta })] \Vert ^{2}_{L^{\infty }_{\rho }(\Gamma ^d)} \sum _{k = 1}^{N} \sum _{l = 1}^{N} \vert \varvec{\psi }^i_m[k]||\varvec{\psi }^j_q[l]|. \end{matrix}} \end{aligned}$$

Since \(\varvec{\psi }^i_m\) and \(\varvec{\psi }^j_q\) are orthonormal, then \(\sum _{k = 1}^{N} \sum _{l = 1}^{N} |\varvec{\psi }^i_m[k] ||\varvec{\psi }^j_q[l] |\le \sqrt{n_m n_q}\)\(\Vert \varvec{\psi }^i_m[k]\Vert _{l^2} \Vert \varvec{\psi }^j_q[l]\Vert _{l^2}\) \(= \sqrt{n_m n_q}\). From Lemma 10, the result follows.

Proof of theorem 8

The polynomial function is an entire function. However, the function \(K_{\nu }(\vartheta )\) and \(\vartheta ^{\frac{1}{2}}\) are analytic for all \(\vartheta \in \mathbb {C}\) except at the branch cut \((-\infty ,0]\). Thus, it is sufficient to check the analytic extension of \(r(\varvec{\theta }) = \Big ( \sum _{k=1}^{d} \theta _{k} (x_k - y_k)^{2} \Big )^{\frac{1}{2}}\). Let \(z \in \mathbb {C}\) be the complex extension of \(r \in \mathbb {R}\). More precisely, \(z = \Big ( \sum _{k=1}^{d} \theta _{k} z_k^{2} \Big )^{\frac{1}{2}}\), where \(z_k \in \mathbb {C}\) is the complex extension of \((x_k - y_k)\).

Let \(\vartheta = \sum _{k=1}^{d} \theta _{k} z_k^{2}\), then by taking the appropriate branch \({\text {Re}}\, z = r_{\vartheta }\) \(\cos {( \theta _{\vartheta }/2)}\), where \(r^2_{\vartheta } = ({\text {Re}}\, \vartheta )^2 + (\textrm{Im}\, \vartheta )^2\) and \(\theta _{\vartheta } = \tan ^{-1} \frac{\textrm{Im}\, \vartheta }{{\text {Re}}\, \vartheta }\). Due to the branch cut at \((-\infty ,0]\), we impose the restriction that \({\text {Re}}\, \vartheta > 0\) as \(x_k\) and \(y_k\) are extended in the complex plane. Consider any two cells \(B^{i}_{m} \in \mathcal {B}^i\) and \(B^{j}_{q} \in \mathcal {B}^j\), at levels i and j with the associated distance criterion constant \(\tau _{i,j}>0\). From Algorithms 4, 5, 6, and 7, for any observations \(\textbf{x}^{*} \in B^{i}_{m}\) and \(\textbf{y}^{*} \in B^{j}_{q}\), we have that \((x^*_k - y^*_k)^2 \ge \tau ^2_{i,j}\) for \(k = 1,\dots ,d\). For the rest of the discussion, it is assumed that complex extension is respect to each component \(k = 1,\dots ,d\) unless otherwise specified.

Extend \(\alpha _k \rightarrow \alpha _k + v_k\) and \(\gamma _k \rightarrow \gamma _k + w_k\) where \(v_k:= v^R_k + iv^I_k\), \(w_k:= w^R_k + iw^I_k\), and \(v^R_k,v^I_k,w^R_k,w^I_k \in \mathbb {R}\). Let \(\tilde{x}_k\) be the extension of \(x_k\) in the complex plane and similarly for \(\tilde{y}_k\). It follows that \(\tilde{x}^R_k:= {\text {Re}}\, \tilde{x}_k = \frac{1}{2} (\alpha _k + 1 + v^R_k)a_k + b_k\), \(\tilde{x}^I_k = \textrm{Im}\, \tilde{x}_k = \frac{v^I_k}{2} a_k\), \(y^R_k:= {\text {Re}}\, \tilde{y}_k = \frac{1}{2} (\gamma _k + 1 + w^R_k)c_k + d_k\), and \(y^I_k:= \textrm{Im}\, \tilde{y}_k = \frac{w^I_k}{2} c_k\). After some manipulation

$$\begin{aligned} \text {Re}\, z^2_k{} & {} = (\tilde{x}^R_k - \tilde{y}^R_k)^2 - (\tilde{x}^I_k - \tilde{y}^I_k)^2 = (x_k - y_k)^2 + \frac{1}{4}(v^R_k a_k - w^R_k c_k) \nonumber \\{} & {} + (x_k - y_k)(v^R_k a_k - w^R c_k) -\frac{1}{4}(a_kv^I_k - c_k w^I_k)^2. \end{aligned}$$
(C5)

Recall that \((x_k - y_k)^2 \ge \tau ^2_{i,j}\) and suppose that there is a positive constant \(\delta _{k} > 0 \) such that

$$\begin{aligned} \delta _{k} \le \frac{\sqrt{32\,\tau _{i,j} ^2+8\,\tau _{i,j} +1}-1 - 4\,\tau _{i,j} }{4\,\tau _{i,j}}. \end{aligned}$$
(C6)

Assume that \(|v^R_k |\le \tau _{i,j} \delta _{k} / a_{k}\), \(|v^I_k |\le \tau _{i,j} \delta _{k}/a_{k}\), \(|w^R_k |\le \tau _{i,j} \delta _{k} / c_{k}\), and \(|w^I_k |\le \tau _{i,j} \delta _{k} / c_{k}\). From equations (C5) and (C6), it follows that

$$\begin{aligned} \text {Re}\,\, z^2_k \ge \tau _{i,j}^2 (1 - 4 \delta _{k}^2) - \frac{\tau _{i,j} \delta _{k}}{2} > 0. \end{aligned}$$
(C7)

Furthermore,

$$\begin{aligned} |\text {Re}\,\, z^2_k |\le & {} (\max \{ |y^{max}_k - x^{min}_{k} |, |x^{max}_k - y^{min}_{k} |\})^2 \nonumber \\+ & {} \frac{1}{2}\tau _{i,j} \delta _{k} + \max \{ |y^{max}_k - x^{min}_{k} |, |x^{max}_k - y^{min}_{k} |\} 2\tau _{i.j} \delta _{k} + \tau ^2_{i,j} \delta _{k}^2 \nonumber \\\le & {} 1 + \frac{5}{2}\tau _{i,j} \delta _{k} + \tau ^2_{i,j} \delta _{k}^2. \end{aligned}$$
(C8)

Similarly,

$$\begin{aligned} |\text {Im}\, z^2_k |= |2(\tilde{x}^R_k - \tilde{y}^R_k)(x^I_k - y^I_k) |\le 2 \tau _{i.j} \delta _{k} + 4 \tau ^2_{i,j} \delta _{k}^2. \end{aligned}$$
(C9)

We now show how \(\alpha _k\) and \(\gamma _k\) can be extended into the Bernstein ellipses \(\mathcal {E}_{\sigma ^{\alpha }_k}\) and \(\mathcal {E}_{\sigma ^{\gamma }_k}\), for some \(\sigma ^{\alpha }_k > 0\) and \(\sigma ^{\gamma }_k > 0\) such that \({\text {Re}}\, z^2_k > 0\). Recall that \(|v^R_k |\le \tau _{i,j} \delta _{k} / a_{k}\), \(|v^I_k |\le \tau _{i,j} \delta _{k}/a_{k}\), \(|w^R_k |\le \tau _{i,j} \delta _{k} / c_{k}\), and \(|w^I_k |\le \tau _{i,j} \delta _{k} / c_{k}\). These restrictions form a region in \(\mathbb {C}\times \mathbb {C}\), and a Bernstein ellipse is embedded (see Fig. 6). This is done by solving the following equation: \(\frac{e^{\sigma ^{\alpha }_k} + e^{-\sigma ^{\alpha }_k}}{2} = 1 + \frac{\tau _{i,j} \delta _{k}}{a_k}\). The unique solution is \(\sigma ^{\alpha }_k = \cosh ^{-1} \left( 1 + \frac{\tau _{i,j} \delta _{k}}{a_k} \right) \) with \(\sigma ^{\alpha }_k > 0\). Following a similar argument, we have that \(\sigma ^{\gamma }_k = \cosh ^{-1} \left( 1 + \frac{\tau _{i,j} \delta _{k}}{c_k} \right) \) with \(\sigma ^{\gamma }_k > 0\). Let \(\mathcal {E}^{d}_{\alpha }:= \prod _{k=1}^{d} \mathcal {E}_{\sigma ^{\alpha }_k}\) and \(\mathcal {E}^{d}_{\gamma }:= \prod _{k=1}^{d} \mathcal {E}_{\sigma ^{\gamma }_k}\). It follows that

$$\begin{aligned} \text {Re}\, \vartheta \ge \sum _{k=1}^{d} \theta _k\, \text {Re}\, z^2_k \ge \sum _{k=1}^{d} \theta _k \left( \tau _{i,j}^2 (1 - 4 \delta _{k}^2) - \frac{\tau _{i,j} \delta _{k}}{2} \right) > 0. \end{aligned}$$
(C10)

Thus, there exist an analytic extension of \(\phi (r;\varvec{\theta }):\Gamma ^d \times \Gamma ^d \rightarrow \mathbb {R}\) on \(\mathcal {E}^{d}_{\alpha } \times \mathcal {E}^{d}_{\gamma }\).

The next step is to bound the absolute value of the Matérn covariance function \( |\phi (z;\varvec{\theta }) |\) in \(\mathcal {E}^{d}_{\alpha } \times \mathcal {E}^{d}_{\gamma }\). If \(\nu > \frac{1}{2}\) and \({\text {Re}}\, z>0\), then the modified Bessel function of the second kind satisfies the following identity: \(K_{\nu }(\sqrt{2 \nu }z) = \frac{\sqrt{\pi } (\sqrt{2 \nu }z)^{\nu }}{2^\nu \Gamma (\nu + \frac{1}{2})} \int _{1}^{\infty } (t^2 - 1)^{\nu - \frac{1}{2}} \exp {(-\sqrt{2 \nu }zt)}\, \text {d}t.\) It is not hard to show that for \(\nu > \frac{1}{2}\) and \({\text {Re}}\, z >0\), we have that \( |K_{\nu }(\sqrt{2 \nu }z) |\le \frac{ |\sqrt{2 \nu }z |^{\nu }}{({\text {Re}}\, \sqrt{2 \nu }z)^{\nu }}\) \(K_{\nu }(\sqrt{2 \nu } {\text {Re}}\, z)\). Note that \(r_{\vartheta } \ge {\text {Re}}\, \vartheta > 0\). From equation (C9), we have that \(\textrm{Im}\, \vartheta = \sum _{k=1}^{d} \theta _k\, \text {Im}\, z^2_k \le \sum _{k=1}^{d} 2 \tau \delta _{k} + 4 \tau ^2 \delta _{k}^2\). From equation (C10) \( |\theta _{\vartheta } |\le \xi (\varvec{\theta },\varvec{\delta },\tau _{i,j}) \).

Since \(K_{\nu }(\cdot )\) is strictly completely monotonic [42], then

$$\begin{aligned} |K_{\nu }(\sqrt{2 \nu }\, \text {Re}\, z) |= & {} |K_{\nu }\ (\sqrt{2 \nu } r_{\vartheta } \cos (\theta _{\vartheta }/2)) |\le |K_{\nu }\Big (\sqrt{\frac{\nu }{2}} \cos (\xi (\varvec{\theta },\varvec{\delta },\tau )/2) \nonumber \\{} & {} \sum _{k=1}^{d} \theta _k \Big ( \tau _{i,j}^2 (1 - 4 \delta _{k}^2) - \frac{\tau _{i,j} \delta _{k}}{2} \Big ) \Big ) |. \end{aligned}$$
(C11)

From equations (C8) and (C9),

$$\begin{aligned} |z_k |^{2} \le |\, \text {Re}\, z^2_k |+ |\text {Im}\, z^2_k |\le \,\mathcal {R}\,(\delta _k,\tau _{i,j}):= 1 + \frac{9}{2}\tau _{i,j} \delta _{k} + 5 \tau ^2_{i,j} \delta _{k}^2 \end{aligned}$$

and therefore,

$$\begin{aligned} {\begin{matrix} |z |&\le |\sum _{k=1}^{d} \theta _{k} z_k^{2} |^{\frac{1}{2}} \le \left( \sum _{k=1}^{d} \theta _{k} |z_k |^{2} \right) ^{\frac{1}{2}} \le \left( \sum _{k=1}^{d} \theta _k \mathcal {R}(\delta _k,\tau _{i,j}) \right) ^{\frac{1}{2}}. \end{matrix}} \end{aligned}$$
(C12)

By combining equations (C9), (C10), (C11), and (C12), we have now proven the Theorem.

Fig. 6
figure 6

a Region of complex extension of \(\alpha _k\). b Embedding of Bernstein ellipse \(\mathcal {E}_{\sigma ^{\alpha }_k}\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Castrillón-Candás, J.E. Spatial best linear unbiased prediction: a computational mathematics approach for high dimensional massive datasets. Adv Comput Math 50, 37 (2024). https://doi.org/10.1007/s10444-024-10132-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10444-024-10132-9

Keywords

Mathematics Subject Classification (2010)

Navigation