[R-sig-Geo] geoR likfit: confusing AIC results?

Sat Aug 14 22:26:07 CEST 2010

Dear Thimothy

I will have a ore detailed look at your study to inspect this better.

However, I'd like to points some issues I've detected in a first glence to 
you code:

1. your area is of the size 6000x6000
This is related with distances and order of magnitude of the parameters, 
in particular to phi.
Since you total variances are around 3 the maximization may need some 
tunung for dealing with different oreders or magnitude

    1.a a simple suggestion would be replace 6000 by 6 and see whether it
        change the estimates
    1.b an alternative would be to pass values of fnscale to the optimizer 
(optim) through the appropriated ... mechanism

2. Notice that summary() on a likfit object (or elements of the list) 
already give you the fit and likelihood values for the pure nugget 
(independence) model such that you do not need to run the maximizer over 
it

best
P.J.

Paulo Justiniano Ribeiro Jr
LEG (Laboratorio de Estatistica e Geoinformacao)
Universidade Federal do Parana
Caixa Postal 19.081
CEP 81.531-990
Curitiba, PR  -  Brasil
Tel: (+55) 41 3361 3573
VOIP: (+55) (41) (3361 3600) 1053 1066
Fax: (+55) 41 3361 3141
e-mail: paulojus AT  ufpr  br
http://www.leg.ufpr.br/~paulojus

On Sat, 14 Aug 2010, Timothy W. Hilton wrote:

> Dear R-sig-geo readers,
>
> I am trying to use geoR's likfit to determine whether some spatial
> fields are better described by an exponential semivariogram or a pure
> nugget semivariogram.  To investigate the frequency of a 'false
> positive', where my test picks the exponential semivariogram when
> there is in fact no spatial correlation in the field, I ran the
> following experiment:
>
> I used geoR's grf function to create ten different Gaussian random
> fields (GRFs) with phi = 0.0, sigmasq = 0.001, and nugget = 3.0.  For
> each GRF I use geoR's likfit to fit 50 exponential covariance models
> and 50 pure nugget covariance models.  Within each set of 50 models, I
> chose the lowest AIC (as reported by likfit) as the "best" fit.
>
> Because the GRFs were generated with no spatial correlation, I
> expected the best-fit pure nugget AIC to be lower than the best-fit
> exponential AIC in the majority of cases.  Across my ten experiements,
> though, the best-fit exponential AIC was lower eight times, and the
> best-fit pure nugget AIC was lower only once (the other was a tie).
>
> This concerns me, as it suggests than many of my fields that I had
> concluded *do* have spatial structure in fact may not.
>
> Am I misusing the AIC result from likfit to choose a covariance
> structure?  I will greatly appreciate any advice the list can offer.
>
> A more detailed description of my experiment, code, and results follow
> below.
>
> many thanks,
> Tim
>
> --
>
> Timothy W. Hilton
> PhD Candidate, Department of Meteorology
> The Pennsylvania State University
> 503 Walker Building, University Park, PA   16802
> hilton at meteo.psu.edu
>
> ==================================================
>
> For each of the ten GRFs, I used likfit to estimate parameters for
> exponential covariance model 50 times.  Each likfit began with initial
> covariance parameters drawn randomly from uniform distributions within
> (0,10) for sigmasq, (0,1000) for phi, and (0,10) for the nugget.  To
> estimate the pure nugget parameters, I again ran likfit 50 times with
> initial covariance parameters of phi = 0.0, sigmasq = 0.001, and
> nugget = 3.0.
>
> ==================================================
>
> Resulting AIC for best-fit model for each of the ten GRFs.
> Exponential (exp) is less than pure nugget (PN) in eight of ten cases.
>
>         PN      exp
> 1  251.9389 251.6512
> 2  276.7283 276.3321
> 3  252.2566 252.2566
> 4  227.2106 227.2106
> 5  251.4253 251.4252
> 6  268.3910 268.3910
> 7  259.6977 257.8298
> 8  254.6102 247.9796
> 9  262.4842 262.0296
> 10 258.1551 257.4813
>
> ==================================================
>
> The R code I used.  It took about ten minutes to complete on my
> notebook computer.
>
> #----------------------------------------------------------------------
> getGRFRealizations <- function(nsim=1) {
>
>  phi <- 0.0
>  sigmasq <- 0.001
>  nug <- 3.0
>
>  realizations <- grf(n=65,
>                      xlims=c(0, 6000),
>                      ylims=c(0, 6000),
>                      nsim=nsim,
>                      cov.model='pure.nugget',
>                      cov.pars=c(sigmasq, phi),
>                      nugget=nug)
>  return(realizations)
> }
>
> #----------------------------------------------------------------------
> # performs likfit for a Gaussian Random Field (fld) multiple times
> # with initial covariance parameters drawn from a uniform distribution
> # within a window.  Returns the fit with the lowest AIC.
> multi.likfit <- function(nfits=50, data, coords, cov.model) {
>
>  if (cov.model == 'exponential')
>    pars <- c(runif(n=1, min=0, max=10),
>              runif(n=1, min=0, max=1000))
>  else
>    pars <- c(0.001, 0.0)  ## for pure nugget
>
>  # estimate one fit
>  best.so.far <- likfit(data=data,
>                        coords=coords,
>                        cov.model=cov.model,
>                        ini.cov.pars=pars,
>                        nugget=runif(n=1, min=0, max=10),
>                        messages=FALSE)
>  # do the rest of the fits, keep the lowest AIC
>  for (i in 2:nfits) {
>    this.fit <- likfit(data=data,
>                       coords=coords,
>                       cov.model=cov.model,
>                       ini.cov.pars=pars,
>                       nugget=runif(n=1, min=0, max=10),
>                       messages=FALSE)
>    if (this.fit$AIC < best.so.far$AIC)
>      best.so.far <- this.fit
>  }
>  return(best.so.far)
> }
>
> #----------------------------------------------------------------------
> fitGRFRealizations <- function(nsim=5, nfits=50) {
>
>  #allocate lists to hold the covariance model fits
>  fits.list.exp <- vector(mode='list', length=nsim) #for exponential fits
>  fits.list.PN <- vector(mode='list', length=nsim)  #for pure nugget fits
>  #generate random field realizations
>  flds <- getGRFRealizations(nsim=nsim)
>
>  #fit covariance model to each of the random fields
>  for (i in 1:nsim) {
>    fits.list.exp[[i]] <- multi.likfit(nfits=50,
>                                       data=flds$data[, i],
>                                       coords=flds$coords,
>                                       cov.model='exponential')
>    fits.list.PN[[i]] <- multi.likfit(nfits=50,
>                                      data=flds$data[, i],
>                                      coords=flds$coords,
>                                      cov.model='pure.nugget')
>    cat('.')  # crude progress bar
>  }
>  cat('\n')
>  return(list(exponential=fits.list.exp, pure.nugget=fits.list.PN))
> }
>
> #----------------------------------------------------------------------
> gatherFits <- function(fits.list) {
>  phi <- unlist(lapply(fits.list, function(x) x[['phi']]))
>  sigmasq <- unlist(lapply(fits.list, function(x) x[['sigmasq']]))
>  nugget <- unlist(lapply(fits.list, function(x) x[['nugget']]))
>  pRange <- unlist(lapply(fits.list, function(x) x[['practicalRange']]))
>  AIC <- unlist(lapply(fits.list, function(x) x[['AIC']]))
>  df <- data.frame(phi=phi, sigmasq=sigmasq, nugget=nugget, pRange=pRange, AIC=AIC)
>  return(df)
> }
>
> result <- fitGRFRealizations(nsim=10)
> df <- data.frame(PN.AIC=unlist(lapply(result[['pure.nugget']], function(x) x$AIC)),
>                 exp.AIC=unlist(lapply(result[['exponential']], function(x) x$AIC)))
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>