Introduction to BMEmapping

Introduction

The Bayesian Maximum Entropy (BME) framework offers a robust and versatile approach for space-time data analysis and uncertainty quantification. By integrating principles from Bayesian statistics and the maximum entropy formalism, BME enables the construction of optimal estimates for spatial or spatiotemporal processes in the presence of both precise (hard) and imprecise (soft) data. While hard data correspond to exact point-value measurements, soft data may take the flexible forms of intervals, probability distributions, or qualitative descriptors, making BME particularly well-suited for complex real-world datasets.

The BMEmapping R package provides a user-friendly implementation of core BME methodologies, facilitating geostatistical modeling, prediction, and data fusion. It allows for a systematic integration of heterogeneous data sources, incorporates prior knowledge, and supports variogram-based spatial modeling—essential tools for accurate and interpretable spatial interpolation.

Specifically, BMEmapping is designed to perform spatial interpolation at unobserved locations using both hard and soft-interval data. This vignette introduces the fundamental functionality of the package and guides users through its basic usage.

To begin, load the package with:

library(BMEmapping)

Main Functions

The main functions available in BMEmapping include:

prob_zk - computes and optionally plots the posterior density estimate at a single unobserved location.

bme_predict - predicts the posterior mean or mode and the associated variance at an unobserved location.

bme_cv - performs a cross-validation on the hard data to assess model performance.

\(~\)

A Data Example

To introduce the functionality of BMEmapping, we will look at a modeling problem for estimating reliability-targeted snow loads in the state of Utah. The utsnowload data that is part of the package and can be accessed by the command

data("utsnowload")
head(utsnowload)
#>   latitude longitude        hard lower upper
#> 1    40.44   -112.24  0.09696012    NA    NA
#> 2    39.94   -112.41  0.12258678    NA    NA
#> 3    37.51   -113.40 -0.02302358    NA    NA
#> 4    37.49   -113.85  0.50354362    NA    NA
#> 5    39.31   -109.53 -0.68611327    NA    NA
#> 6    40.72   -109.54 -0.53000397    NA    NA

The variables latitude and longitude represent the geographic coordinates of each location. The variable hard contains the values of precise (hard) data measurements, while lower and upper define the bounds of the soft interval data.

Complete documentation for the utsnowload dataset can be accessed using the command

?utsnowload

Input Requirements

The BMEmapping functions require the following input arguments:

x: A matrix specifying the geographic location(s) where predictions are to be made.

ch: A matrix or data frame containing the geographic coordinates of the hard data locations.

cs: A matrix or data frame containing the geographic coordinates of the soft-interval data locations.

zh: A vector of observed hard data values corresponding to the locations in ch.

a: A vector of lower bounds for the soft-interval data at locations cs.

b: A vector of upper bounds for the soft-interval data at locations cs.

Variography

Before using BMEmapping, the user must fit a variogram model to the spatial data. This step involves specifying the type of variogram and its associated parameters:

A recommended tool for variogram modeling is the gstat package, which provides a robust suite of functions for fitting and analyzing variograms.

Optional Parameters

The optional parameters are set to their default values. For further details, refer to the function documentation (e.g., ?prob_zk).

BME Prediction

Using the utsnowload dataset, you can prepare the necessary input variables as shown below. In this example, we designate the last 5 soft data locations (locations 228 to 232) as the prediction locations.

# prediction location
x <- data.matrix(utsnowload[228:232, c("latitude", "longitude")])
x
#>     latitude longitude
#> 228  40.3000 -111.2550
#> 229  40.4447 -111.7075
#> 230  40.7908 -111.4078
#> 231  37.7064 -112.1456
#> 232  40.9139 -111.3983

The hard and soft-interval data are assigned as

# hard data locations
ch <- data.matrix(utsnowload[1:67, c("latitude", "longitude")]) 

# soft data locations
cs <- data.matrix(utsnowload[68:227, c("latitude", "longitude")])  

# hard data values
zh <- c(utsnowload[1:67, c("hard")])

# lower bounds
a <- c(utsnowload[68:227, c("lower")]) 

# upper bounds
b <- c(utsnowload[68:227, c("upper")]) 

The variogram model and parameters are given as:

# variogram model and parameters
model <- "exp"
nugget <- 0.0953
sill <- 0.3639
range <- 1.0787

The prob_zk function accepts all the data and variogram input arguments explained above. The numerical estimation of the posterior density for prediction location is computed as

prob_zk(x[1,], ch, cs, zh, a, b, model, nugget, sill, range, plot = TRUE)

#>           zk_i prob_zk_i
#> 1  -3.30630142   0.00000
#> 2  -3.19800956   0.00000
#> 3  -3.08971769   0.00000
#> 4  -2.98142583   0.00000
#> 5  -2.87313396   0.00000
#> 6  -2.76484209   0.00000
#> 7  -2.65655023   0.00000
#> 8  -2.54825836   0.00000
#> 9  -2.43996650   0.00000
#> 10 -2.33167463   0.00000
#> 11 -2.22338277   0.00000
#> 12 -2.11509090   0.00000
#> 13 -2.00679903   0.00001
#> 14 -1.89850717   0.00005
#> 15 -1.79021530   0.00015
#> 16 -1.68192344   0.00046
#> 17 -1.57363157   0.00129
#> 18 -1.46533971   0.00339
#> 19 -1.35704784   0.00827
#> 20 -1.24875597   0.01873
#> 21 -1.14046411   0.03945
#> 22 -1.03217224   0.07725
#> 23 -0.92388038   0.14064
#> 24 -0.81558851   0.23816
#> 25 -0.70729664   0.37517
#> 26 -0.59900478   0.54983
#> 27 -0.49071291   0.74979
#> 28 -0.38242105   0.95140
#> 29 -0.27412918   1.12336
#> 30 -0.16583732   1.23423
#> 31 -0.05754545   1.26163
#> 32  0.05074642   1.19985
#> 33  0.15903828   1.06142
#> 34  0.26733015   0.87325
#> 35  0.37562201   0.66804
#> 36  0.48391388   0.47508
#> 37  0.59220574   0.31400
#> 38  0.70049761   0.19283
#> 39  0.80878948   0.11000
#> 40  0.91708134   0.05827
#> 41  1.02537321   0.02866
#> 42  1.13366507   0.01308
#> 43  1.24195694   0.00554
#> 44  1.35024881   0.00218
#> 45  1.45854067   0.00079
#> 46  1.56683254   0.00027
#> 47  1.67512440   0.00008
#> 48  1.78341627   0.00002
#> 49  1.89170813   0.00001
#> 50  2.00000000   0.00000

The plot of the posterior density becomes smoother as the value of n increases.

The bme_predict function accepts the same arguments as the prob_zk function, with the addition of a type argument, which specifies the preferred type of prediction (either the posterior mean or mode). Using the data provided, we can predict the mode and mean of the posterior density at the prediction location location by:

# posterior mode
bme_predict(x, ch, cs, zh, a, b, model, nugget, sill, range, type = "mode")
#>     latitude longitude    mode
#> 228  40.3000 -111.2550 -0.0575
#> 229  40.4447 -111.7075  0.2673
#> 230  40.7908 -111.4078 -0.0575
#> 231  37.7064 -112.1456 -0.4907
#> 232  40.9139 -111.3983 -0.1658

# posterior mean
bme_predict(x, ch, cs, zh, a, b, model, nugget, sill, range, type = "mean")
#>     latitude longitude    mean variance
#> 228  40.3000 -111.2550 -0.1018   0.2071
#> 229  40.4447 -111.7075  0.1901   0.0893
#> 230  40.7908 -111.4078 -0.0627   0.1878
#> 231  37.7064 -112.1456 -0.4569   0.1515
#> 232  40.9139 -111.3983 -0.1349   0.2049

Leave-One-Out Cross-Validation (LOOCV) for Model Evaluation

LOOCV is used to assess prediction accuracy by successively removing one hard data point at a time—where true values are known—and predicting its value using the remaining hard data and all of the soft-interval data. A variogram model is fitted to the reduced dataset, and the predicted value at the excluded location is compared to its observed value. Soft data locations are excluded from the validation set, as their true values are unobservable.

The bme_cv function performs LOOCV for at hard data locations and returns key performance metrics, including mean error (ME), mean absolute error (MAE), and root mean squared error (RMSE) of the prediction residuals. These statistics offer insight into the model’s bias, average prediction accuracy, and the variability of prediction errors, respectively.

Functionally, bme_cv accepts similar arguments as the bme_predict function. Given the necessary data inputs and variogram parameters, LOOCV can be applied to evaluate the posterior mean predictions as follows:

bme_cv(ch, cs, zh, a, b, model, nugget, sill, range, type = "mean")

#> $results
#>     coord.1 coord.2   observed   mean   variance residual fold
#> 1    40.44 -112.24  0.09696012 -0.1742   0.2811   0.2712    1
#> 2    39.94 -112.41  0.12258678 -0.3519   0.2940   0.4745    2
#> 3    37.51 -113.40 -0.02302358  0.0162   0.2168  -0.0392    3
#> 4    37.49 -113.85  0.50354362 -0.1098   0.2483   0.6133    4
#> 5    39.31 -109.53 -0.68611327 -0.3871   0.3520  -0.2990    5
#> 6    40.72 -109.54 -0.53000397 -0.6945   0.1586   0.1645    6
#> 7    40.61 -109.89 -0.71923519 -0.8164   0.2002   0.0972    7
#> 8    40.91 -109.96 -1.31503404 -1.2461   0.1879  -0.0689    8
#> 9    40.74 -109.67 -0.94879597 -0.6540   0.1480  -0.2948    9
#> 10   40.92 -110.19 -1.39798035 -1.0320   0.2295  -0.3660   10
#> 11   40.95 -110.48 -1.21900906 -1.0311   0.0588  -0.1879   11
#> 12   40.60 -110.43 -1.24787225 -0.9276   0.1412  -0.3203   12
#> 13   40.55 -110.69 -0.55027484 -0.6074   0.1044   0.0571   13
#> 14   40.91 -110.50 -1.06708711 -1.0355   0.0633  -0.0316   14
#> 15   40.72 -110.47 -1.14044998 -1.0730   0.1463  -0.0674   15
#> 16   40.58 -110.59 -0.94551554 -0.6273   0.1167  -0.3182   16
#> 17   40.86 -110.80 -0.83840015 -0.6303   0.1204  -0.2081   17
#> 18   40.77 -110.01 -1.24671792 -1.2210   0.1835  -0.0257   18
#> 19   40.80 -110.88 -0.65036211 -0.7069   0.1086   0.0565   19
#> 20   40.68 -110.95 -0.37127802 -0.5523   0.1481   0.1810   20
#> 21   39.89 -110.75 -0.80367306 -0.4423   0.2055  -0.3614   21
#> 22   39.96 -110.99 -0.54230365 -0.3535   0.1659  -0.1888   22
#> 23   41.38 -111.94  0.94099563  1.3172   0.0495  -0.3762   23
#> 24   41.31 -111.45  0.24796667  0.0396   0.2536   0.2084   24
#> 25   41.41 -111.83  0.47642403  0.8217   0.1410  -0.3453   25
#> 26   41.38 -111.92  1.25233814  0.7392   0.0298   0.5131   26
#> 27   41.90 -111.63  0.61655171  0.0708   0.2713   0.5458   27
#> 28   41.68 -111.42  0.18443361 -0.0449   0.2339   0.2293   28
#> 29   41.41 -111.54  0.11223798  0.1876   0.0916  -0.0754   29
#> 30   41.47 -111.50  0.10561343  0.1456   0.0924  -0.0400   30
#> 31   40.85 -111.05 -0.10690304 -0.3928   0.0506   0.2859   31
#> 32   40.89 -111.07 -0.29946212 -0.2690   0.0496  -0.0305   32
#> 33   40.16 -111.21  0.00344554 -0.1073   0.2126   0.1107   33
#> 34   40.99 -111.82  0.78786432  0.1035   0.1912   0.6844   34
#> 35   40.43 -111.62  0.39822325  0.1897   0.2016   0.2085   35
#> 36   40.36 -111.09 -0.24414027 -0.1348   0.1680  -0.1093   36
#> 37   40.61 -111.10 -0.52669066 -0.2962   0.1611  -0.2305   37
#> 38   40.76 -111.63  0.14568497  0.3546   0.1824  -0.2089   38
#> 39   40.79 -111.12 -0.10923301 -0.2849   0.1393   0.1757   39
#> 40   39.68 -111.32 -0.08382941 -0.3556   0.1434   0.2718   40
#> 41   39.31 -111.43 -0.78984433 -0.4174   0.1735  -0.3724   41
#> 42   39.14 -111.56 -0.38648680 -0.5594   0.1321   0.1729   42
#> 43   39.05 -111.47 -0.57739062 -0.6028   0.1091   0.0254   43
#> 44   39.87 -111.28 -0.22947205 -0.0083   0.0417  -0.2212   44
#> 45   39.89 -111.25 -0.03805984 -0.2372   0.0346   0.1991   45
#> 46   39.45 -111.27 -0.42606551 -0.5614   0.1873   0.1353   46
#> 47   39.13 -111.44 -0.52777166 -0.5837   0.1149   0.0559   47
#> 48   39.01 -111.58 -0.81486819 -0.4709   0.1300  -0.3440   48
#> 49   39.93 -111.63  0.06849776 -0.1495   0.1982   0.2180   49
#> 50   38.77 -111.68 -0.68746363 -0.4619   0.0430  -0.2256   50
#> 51   38.68 -111.60 -1.04793061 -0.7220   0.1395  -0.3259   51
#> 52   38.21 -111.48 -1.40848147 -0.7562   0.2956  -0.6523   52
#> 53   38.80 -111.68 -0.43759896 -0.6781   0.0433   0.2405   53
#> 54   37.84 -111.88 -0.73581358 -0.7046   0.3145  -0.0312   54
#> 55   38.51 -112.02 -0.90807705 -0.7549   0.2387  -0.1532   55
#> 56   38.48 -112.39 -0.67118202 -0.8935   0.2731   0.2223   56
#> 57   38.30 -112.36 -0.76527983 -0.4291   0.1105  -0.3362   57
#> 58   38.30 -112.44 -0.51835705 -0.4790   0.0702  -0.0394   58
#> 59   38.88 -112.25 -0.24704072 -0.5178   0.2858   0.2708   59
#> 60   37.58 -112.90 -0.42302609 -0.2958   0.0683  -0.1272   60
#> 61   37.49 -112.58  0.00732065  0.0269   0.0663  -0.0196   61
#> 62   37.49 -112.51  0.02427501  0.0493   0.0417  -0.0250   62
#> 63   37.66 -112.74 -0.76376457 -0.3283   0.1760  -0.4355   63
#> 64   37.57 -112.84 -0.28791382 -0.4297   0.0607   0.1418   64
#> 65   37.53 -113.05 -0.07280592 -0.2826   0.1556   0.2098   65
#> 66   38.48 -109.27 -0.90950964 -0.4185   0.2903  -0.4910   66
#> 67   37.81 -109.49 -0.39635792 -0.5006   0.3202   0.1042   67
#> 
#> $metrics
#>       ME    MAE   RMSE
#> 1 -0.0127 0.2259 0.2769