[R] MRF smoothers in MGCV - specifying neighbours

Thu May 8 17:21:41 CEST 2014

Hi Mark,

The problem here is that the constructor expects there to be at least 
one observation per location. The nb.l list has neighbourhood 
information for 166 locations, while the 'obs' data contains 
observations for only 99 of them (unique(obs$xy.idx)).

The solution probably requires more complicated construction of nb.l. 
You can't just drop locations from the existing nb.l because that messes 
up the internal indexing of nb.l. You could add a dummy observation with 
zero weight for each of the extra locations, but I guess that isn't what 
you really want to do for this application, as presumably the 
neighbourhood structure is not supposed to lead to smoothing  across the 
gap between the two arms of the sausage...

best,
Simon

On 08/05/14 15:17, Mark Payne wrote:
> Hi Roger and Simon,
>
> Thanks for the replies. Simon's suggestion of an isolated or missing
> neighbourhood doesn't hold either.
>
> I've attached the code below - its my attempt to solve the FELSPLINE
> sausage using mrf rather than a soap smoother. Its a bit convoluted, but
> should run ok. I thought this would be a good starting example to get a
> GMRF running, but then hit the problem mentioned. My attempts to track
> the bug suggest that there is something wierd in the knots argument that
> is being supplied to smooth.construct.mrf.smooth.spec() - but haven't
> come so much further than that.
>
> Code follows.
>
> Mark
>
> #GMRF Example
> #Solves the classic FELSPINE problem using
> #GMRF in mgcv. This example is a modified version of the
> #example from smooth.construct.so.smooth.spec()
> #in the mgcv package
>
> rm(list=ls())
> library(mgcv)
>
> #Extract boundary
> fsb <- fs.boundary()
>
> #Create an underlying grid and evaluate the function on it
> #Based on mgcv::fs.boundary() example
> dx<-0.2;dy<-0.2    #Grid steps
> id.fmt <- "%i/%i"
> x.vals <- seq(-1,4,by=dx)
> y.vals<-seq(-1,1,by=dy)
> grd <- expand.grid(x=x.vals,y=y.vals)
> tru.mat <- matrix(fs.test(grd$x,grd$y),length(x.vals),length(y.vals))
> grd$truth <- as.vector(tru.mat)
> grd$x.idx <- as.numeric(factor(grd$x,x.vals))
> grd$y.idx <- as.numeric(factor(grd$y,y.vals))
> grd$xy.idx <- sprintf(id.fmt,grd$x.idx,grd$y.idx)
> grd <- subset(grd,!is.na <http://is.na>(truth))
>
> ## Simulate some fitting data, inside boundary...
> n.samps<-250
> x <- runif(n.samps)*5-1
> y  <- runif(n.samps)*2-1
> obs <- data.frame(x=x,y=y)
> obs$truth <- fs.test(obs$x,obs$y,b=1)
> obs$z <- obs$truth + rnorm(n.samps)*.3 ## add noise
> pt.inside <- inSide(fsb,x=x,y=y) ## remove outsiders
>
> ## Associate observation with grid cell
> obs$x.rnd <- round(obs$x/dx)*dx
> obs$y.rnd <- round(obs$y/dy)*dy
> obs$x.idx <- as.numeric(factor(obs$x.rnd,x.vals))
> obs$y.idx <- as.numeric(factor(obs$y.rnd,y.vals))
> obs$xy.idx <- sprintf(id.fmt,obs$x.idx,obs$y.idx)
> obs$xy.idx <- factor(obs$xy.idx,levels=grd$xy.idx)
>
> #Filter observations that are outside or don't have an associated grid cell
> obs <- subset(obs,pt.inside & xy.idx %in% grd$xy.idx )
>
> ## plot boundary with truth and data locations
> par(mfrow=c(1,2))
> image(x.vals,y.vals,tru.mat,col=heat.colors(100),xlab="x",ylab="y")
> contour(x.vals,y.vals,tru.mat,levels=seq(-5,5,by=.25),add=TRUE)
> lines(fsb$x,fsb$y);
> points(obs$x,obs$y,pch=3);
>
> #Plot grid
> plot(y~x,grd)
> lines(fsb$x,fsb$y);
>
> #Setup neighbourhood adjancey
> nb <- grd[,c("x.idx","y.idx","xy.idx")]
> nb$N <- factor(sprintf(id.fmt,nb$x.idx,nb$y.idx+1),levels=nb$xy.idx)
> nb$S <- factor(sprintf(id.fmt,nb$x.idx,nb$y.idx-1),levels=nb$xy.idx)
> nb$E <- factor(sprintf(id.fmt,nb$x.idx+1,nb$y.idx),levels=nb$xy.idx)
> nb$W <- factor(sprintf(id.fmt,nb$x.idx-1,nb$y.idx),levels=nb$xy.idx)
> nb.mat <- sapply(nb[,c("N","S","E","W")],as.numeric)
> nb.l <- lapply(split(nb.mat,nb$xy.idx),function(x) x[!is.na
> <http://is.na>(x)])
>

> #Fit MRF gam
> mdl <- gam(z ~ s(xy.idx,bs="mrf",xt=list(nb=nb.l)),data=obs,method="REML")
>
>
>
>
>
>
>
>
>
>
> On 8 May 2014 15:15, Simon Wood <s.wood at bath.ac.uk
> <mailto:s.wood at bath.ac.uk>> wrote:
>
>     Hi Mark,
>
>     I'm not sure what is happening here - there is no chance that nb.l
>     contains a neighbourhood not in the levels of obs$xy.idx, I suppose?
>     i.e. is
>
>     all(names(nb.l)%in%levels(obs$__xy.idx))
>
>     also TRUE? Here is some code illustrating what nb should look like
>     (and in response to Roger Bivand's suggestion I also tried this
>     replacing all the labels with things like "x/y", but it still works).
>
>
>     ## example mrf fit using polygons....
>     library(mgcv)
>     ## Load Columbus Ohio crime data (see ?columbus for details and credits)
>     data(columb)       ## data frame
>     data(columb.polys) ## district shapes list
>     xt <- list(polys=columb.polys) ## neighbourhood structure info for MRF
>     par(mfrow=c(2,2))
>     ## First a full rank MRF...
>     b0 <- gam(crime ~
>     s(district,bs="mrf",xt=xt),__data=columb,method="REML")
>
>     ## same fit based on direct neighbour spec...
>     nb <- mgcv:::pol2nb(columb.polys)$nb
>     xt <- list(nb=nb)
>     b <- gam(crime ~ s(district,bs="mrf",xt=xt),__data=columb,method="REML")
>
>     best,
>     Simon
>
>
>
>
>     On 08/05/14 01:58, Mark Payne wrote:
>
>         Hi,
>
>         Does anyone have an example of a Markov Random Field smoother
>         (MRF) in MGCV
>         where they have specified the neighbourhood directly, rather
>         than supplying
>         polygons? Does anyone understand how the rules should be? Based
>         on the
>         columb example, I have setup my data set and neighbourhood like so:
>
>             head(nb.l)
>
>         $`10/10`
>         [1] 135 155 153
>
>         $`10/2`
>         [1] 27  8  6
>
>         $`10/3`
>         [1] 48  7 28 26
>
>         $`10/4`
>         [1] 69 27 49 47
>
>         $`10/5`
>         [1] 48 70 68
>
>         $`10/7`
>         [1] 115  95  93
>
>             head(obs)
>
>                       x          y      truth x.idx y.idx xy.idx
>         24  1.4835147  0.8026673  2.3605204    13    10  13/10
>         26  1.0452111  0.4673685  1.8316741    11     8   11/8
>         43  2.1514977 -0.2640058 -2.8812026    17     5   17/5
>         46  2.8473951  0.5445714  3.6347799    20     9   20/9
>         53 1.7983253 <tel:53%20%201.7983253> -0.6905912 -2.5473984 15
>         <tel:2.5473984%20%20%20%2015>     3   15/3
>         86 -0.1839814 -0.7824026 -0.5776616     5     2    5/2
>
>
>
>         but get the following error:
>
>             mdl <- gam(truth ~
>
>         s(xy.idx,bs="mrf",xt=list(nb=__nb.l)),data=obs,method="REML")
>         Error in smooth.construct.mrf.smooth.__spec(object, dk$data,
>         dk$knots) :
>             mismatch between nb/polys supplied area names and data area
>         names
>
>         However, there is a perfect match between the nb list names and
>         the data
>         area names:
>
>             all(levels(obs$xy.idx) %in% names(nb.l))
>
>         [1] TRUE
>
>
>
>         Any suggestions where to start?
>
>         Mark
>
>                  [[alternative HTML version deleted]]
>
>         ________________________________________________
>         R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>         https://stat.ethz.ch/mailman/__listinfo/r-help
>         <https://stat.ethz.ch/mailman/listinfo/r-help>
>         PLEASE do read the posting guide
>         http://www.R-project.org/__posting-guide.html
>         <http://www.R-project.org/posting-guide.html>
>         and provide commented, minimal, self-contained, reproducible code.
>
>
>
>     --
>     Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
>     +44 (0)1225 386603 <tel:%2B44%20%280%291225%20386603>
>     http://people.bath.ac.uk/sw283
>
>

-- 
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283