--- title: "Introduction to BMEmapping" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to BMEmapping} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction The Bayesian Maximum Entropy (BME) framework offers a robust and versatile approach for space-time data analysis and uncertainty quantification. By integrating principles from Bayesian statistics and the maximum entropy formalism, BME enables the construction of optimal estimates for spatial or spatiotemporal processes in the presence of both precise (hard) and imprecise (soft) data. While hard data correspond to exact point-value measurements, soft data may take the flexible forms of intervals, probability distributions, or qualitative descriptors, making BME particularly well-suited for complex real-world datasets. The **BMEmapping** R package provides a user-friendly implementation of core BME methodologies, facilitating geostatistical modeling, prediction, and data fusion. It allows for a systematic integration of heterogeneous data sources, incorporates prior knowledge, and supports variogram-based spatial modeling—essential tools for accurate and interpretable spatial interpolation. Specifically, **BMEmapping** is designed to perform spatial interpolation at unobserved locations using both hard and soft-interval data. This vignette introduces the fundamental functionality of the package and guides users through its basic usage. To begin, load the package with: ```{r setup} library(BMEmapping) ``` ## Main Functions The main functions available in **BMEmapping** include: `prob_zk` - computes and optionally plots the posterior density estimate at a single unobserved location. `bme_predict` - predicts the posterior mean or mode and the associated variance at an unobserved location. `bme_cv` - performs a cross-validation on the hard data to assess model performance. $~$ ## A Data Example To introduce the functionality of **BMEmapping**, we will look at a modeling problem for estimating reliability-targeted snow loads in the state of Utah. The `utsnowload` data that is part of the package and can be accessed by the command ```{r} data("utsnowload") head(utsnowload) ``` The variables `latitude` and `longitude` represent the geographic coordinates of each location. The variable `hard` contains the values of precise (hard) data measurements, while `lower` and `upper` define the bounds of the soft interval data. Complete documentation for the `utsnowload` dataset can be accessed using the command ```{r eval=FALSE} ?utsnowload ``` ### Input Requirements The **BMEmapping** functions require the following input arguments: `x`: A matrix specifying the geographic location(s) where predictions are to be made. `ch`: A matrix or data frame containing the geographic coordinates of the hard data locations. `cs`: A matrix or data frame containing the geographic coordinates of the soft-interval data locations. `zh`: A vector of observed hard data values corresponding to the locations in `ch`. `a`: A vector of lower bounds for the soft-interval data at locations `cs`. `b`: A vector of upper bounds for the soft-interval data at locations `cs`. ### Variography Before using `BMEmapping`, the user must fit a variogram model to the spatial data. This step involves specifying the type of variogram and its associated parameters: * `model`: The variogram model type. Supported options are "exp" (Exponential), "sph" (Spherical), and "gau" (Gaussian). The appropriate model should be selected based on the spatial structure of the data. * `nugget`: The nugget effect of the variogram, representing measurement error or microscale variation. * `sill`: The sill of the variogram, indicating the plateau value of the semivariance. * `range`: The range (or effective range) of the variogram, representing the distance beyond which spatial correlation becomes negligible. A recommended tool for variogram modeling is the **gstat** package, which provides a robust suite of functions for fitting and analyzing variograms. ### Optional Parameters * `nhmax`: Maximum number of nearby hard data points to include in the integration process. * `nsmax`: Maximum number of nearby soft-interval data points to include in the integration process. * `zk_range`: A numeric vector specifying the range over which to evaluate the unobserved value at the estimation location. * `n`: An integer indicating the number of points at which to evaluate the posterior density over `zk_range`. The optional parameters are set to their default values. For further details, refer to the function documentation (e.g., ?`prob_zk`). ### BME Prediction Using the `utsnowload` dataset, you can prepare the necessary input variables as shown below. In this example, we designate the last 5 soft data locations (**locations 228 to 232**) as the **prediction locations**. ```{r} # prediction location x <- data.matrix(utsnowload[228:232, c("latitude", "longitude")]) x ``` The hard and soft-interval data are assigned as ```{r} # hard data locations ch <- data.matrix(utsnowload[1:67, c("latitude", "longitude")]) # soft data locations cs <- data.matrix(utsnowload[68:227, c("latitude", "longitude")]) # hard data values zh <- c(utsnowload[1:67, c("hard")]) # lower bounds a <- c(utsnowload[68:227, c("lower")]) # upper bounds b <- c(utsnowload[68:227, c("upper")]) ``` The variogram model and parameters are given as: ```{r} # variogram model and parameters model <- "exp" nugget <- 0.0953 sill <- 0.3639 range <- 1.0787 ``` The `prob_zk` function accepts all the data and variogram input arguments explained above. The numerical estimation of the posterior density for prediction location is computed as ```{r fig.width = 4, fig.height = 4.5, fig.align='center'} prob_zk(x[1,], ch, cs, zh, a, b, model, nugget, sill, range, plot = TRUE) ``` The plot of the posterior density becomes smoother as the value of `n` increases. The `bme_predict` function accepts the same arguments as the `prob_zk` function, with the addition of a `type` argument, which specifies the preferred type of prediction (either the posterior mean or mode). Using the data provided, we can predict the mode and mean of the posterior density at the prediction location location by: ```{r} # posterior mode bme_predict(x, ch, cs, zh, a, b, model, nugget, sill, range, type = "mode") # posterior mean bme_predict(x, ch, cs, zh, a, b, model, nugget, sill, range, type = "mean") ``` ### Leave-One-Out Cross-Validation (LOOCV) for Model Evaluation LOOCV is used to assess prediction accuracy by successively removing one hard data point at a time—where true values are known—and predicting its value using the remaining hard data and all of the soft-interval data. A variogram model is fitted to the reduced dataset, and the predicted value at the excluded location is compared to its observed value. Soft data locations are excluded from the validation set, as their true values are unobservable. The `bme_cv` function performs LOOCV for at hard data locations and returns key performance metrics, including **mean error (ME)**, **mean absolute error (MAE)**, and **root mean squared error (RMSE)** of the prediction residuals. These statistics offer insight into the model’s bias, average prediction accuracy, and the variability of prediction errors, respectively. Functionally, `bme_cv` accepts similar arguments as the `bme_predict` function. Given the necessary data inputs and variogram parameters, LOOCV can be applied to evaluate the posterior mean predictions as follows: ```{r eval=FALSE} bme_cv(ch, cs, zh, a, b, model, nugget, sill, range, type = "mean") #> $results #> coord.1 coord.2 observed mean variance residual fold #> 1 40.44 -112.24 0.09696012 -0.1742 0.2811 0.2712 1 #> 2 39.94 -112.41 0.12258678 -0.3519 0.2940 0.4745 2 #> 3 37.51 -113.40 -0.02302358 0.0162 0.2168 -0.0392 3 #> 4 37.49 -113.85 0.50354362 -0.1098 0.2483 0.6133 4 #> 5 39.31 -109.53 -0.68611327 -0.3871 0.3520 -0.2990 5 #> 6 40.72 -109.54 -0.53000397 -0.6945 0.1586 0.1645 6 #> 7 40.61 -109.89 -0.71923519 -0.8164 0.2002 0.0972 7 #> 8 40.91 -109.96 -1.31503404 -1.2461 0.1879 -0.0689 8 #> 9 40.74 -109.67 -0.94879597 -0.6540 0.1480 -0.2948 9 #> 10 40.92 -110.19 -1.39798035 -1.0320 0.2295 -0.3660 10 #> 11 40.95 -110.48 -1.21900906 -1.0311 0.0588 -0.1879 11 #> 12 40.60 -110.43 -1.24787225 -0.9276 0.1412 -0.3203 12 #> 13 40.55 -110.69 -0.55027484 -0.6074 0.1044 0.0571 13 #> 14 40.91 -110.50 -1.06708711 -1.0355 0.0633 -0.0316 14 #> 15 40.72 -110.47 -1.14044998 -1.0730 0.1463 -0.0674 15 #> 16 40.58 -110.59 -0.94551554 -0.6273 0.1167 -0.3182 16 #> 17 40.86 -110.80 -0.83840015 -0.6303 0.1204 -0.2081 17 #> 18 40.77 -110.01 -1.24671792 -1.2210 0.1835 -0.0257 18 #> 19 40.80 -110.88 -0.65036211 -0.7069 0.1086 0.0565 19 #> 20 40.68 -110.95 -0.37127802 -0.5523 0.1481 0.1810 20 #> 21 39.89 -110.75 -0.80367306 -0.4423 0.2055 -0.3614 21 #> 22 39.96 -110.99 -0.54230365 -0.3535 0.1659 -0.1888 22 #> 23 41.38 -111.94 0.94099563 1.3172 0.0495 -0.3762 23 #> 24 41.31 -111.45 0.24796667 0.0396 0.2536 0.2084 24 #> 25 41.41 -111.83 0.47642403 0.8217 0.1410 -0.3453 25 #> 26 41.38 -111.92 1.25233814 0.7392 0.0298 0.5131 26 #> 27 41.90 -111.63 0.61655171 0.0708 0.2713 0.5458 27 #> 28 41.68 -111.42 0.18443361 -0.0449 0.2339 0.2293 28 #> 29 41.41 -111.54 0.11223798 0.1876 0.0916 -0.0754 29 #> 30 41.47 -111.50 0.10561343 0.1456 0.0924 -0.0400 30 #> 31 40.85 -111.05 -0.10690304 -0.3928 0.0506 0.2859 31 #> 32 40.89 -111.07 -0.29946212 -0.2690 0.0496 -0.0305 32 #> 33 40.16 -111.21 0.00344554 -0.1073 0.2126 0.1107 33 #> 34 40.99 -111.82 0.78786432 0.1035 0.1912 0.6844 34 #> 35 40.43 -111.62 0.39822325 0.1897 0.2016 0.2085 35 #> 36 40.36 -111.09 -0.24414027 -0.1348 0.1680 -0.1093 36 #> 37 40.61 -111.10 -0.52669066 -0.2962 0.1611 -0.2305 37 #> 38 40.76 -111.63 0.14568497 0.3546 0.1824 -0.2089 38 #> 39 40.79 -111.12 -0.10923301 -0.2849 0.1393 0.1757 39 #> 40 39.68 -111.32 -0.08382941 -0.3556 0.1434 0.2718 40 #> 41 39.31 -111.43 -0.78984433 -0.4174 0.1735 -0.3724 41 #> 42 39.14 -111.56 -0.38648680 -0.5594 0.1321 0.1729 42 #> 43 39.05 -111.47 -0.57739062 -0.6028 0.1091 0.0254 43 #> 44 39.87 -111.28 -0.22947205 -0.0083 0.0417 -0.2212 44 #> 45 39.89 -111.25 -0.03805984 -0.2372 0.0346 0.1991 45 #> 46 39.45 -111.27 -0.42606551 -0.5614 0.1873 0.1353 46 #> 47 39.13 -111.44 -0.52777166 -0.5837 0.1149 0.0559 47 #> 48 39.01 -111.58 -0.81486819 -0.4709 0.1300 -0.3440 48 #> 49 39.93 -111.63 0.06849776 -0.1495 0.1982 0.2180 49 #> 50 38.77 -111.68 -0.68746363 -0.4619 0.0430 -0.2256 50 #> 51 38.68 -111.60 -1.04793061 -0.7220 0.1395 -0.3259 51 #> 52 38.21 -111.48 -1.40848147 -0.7562 0.2956 -0.6523 52 #> 53 38.80 -111.68 -0.43759896 -0.6781 0.0433 0.2405 53 #> 54 37.84 -111.88 -0.73581358 -0.7046 0.3145 -0.0312 54 #> 55 38.51 -112.02 -0.90807705 -0.7549 0.2387 -0.1532 55 #> 56 38.48 -112.39 -0.67118202 -0.8935 0.2731 0.2223 56 #> 57 38.30 -112.36 -0.76527983 -0.4291 0.1105 -0.3362 57 #> 58 38.30 -112.44 -0.51835705 -0.4790 0.0702 -0.0394 58 #> 59 38.88 -112.25 -0.24704072 -0.5178 0.2858 0.2708 59 #> 60 37.58 -112.90 -0.42302609 -0.2958 0.0683 -0.1272 60 #> 61 37.49 -112.58 0.00732065 0.0269 0.0663 -0.0196 61 #> 62 37.49 -112.51 0.02427501 0.0493 0.0417 -0.0250 62 #> 63 37.66 -112.74 -0.76376457 -0.3283 0.1760 -0.4355 63 #> 64 37.57 -112.84 -0.28791382 -0.4297 0.0607 0.1418 64 #> 65 37.53 -113.05 -0.07280592 -0.2826 0.1556 0.2098 65 #> 66 38.48 -109.27 -0.90950964 -0.4185 0.2903 -0.4910 66 #> 67 37.81 -109.49 -0.39635792 -0.5006 0.3202 0.1042 67 #> #> $metrics #> ME MAE RMSE #> 1 -0.0127 0.2259 0.2769 ```