[R] package mgcv - predict with bam: Error in X[ind, ] : subscript out of bounds
Simon Wood
s.wood at bath.ac.uk
Mon Feb 3 12:42:23 CET 2014
Hi Katharina,
Thanks for sending this.
The problem is that the prediction data for site contain levels not
available in the (useable non-NA) fit data...
> levels(m$model$site)
[1] "KRB" "NP.FOR" "WKS.FRE" "WKS.KRE" "WKS.RIE" "WKS.WUE"
> levels(gapData$site)
[1] "KRB" "NP.FOR" "RIE.2" "WKS.BBR" "WKS.FRE" "WKS.HOE" "WKS.KRE"
[8] "WKS.RIE" "WKS.WUE"
predict.lm has a check for this, and so fails with a rather more
informative error message. e.g.
m0 <- lm(sensor1 ~ sensor2 + site + site:NthSampling,
data=xylemRohWeekXnn2011,na.action=na.omit)
predict(m0,gapData)
... factor site has new levels RIE.2, WKS.BBR
I'll add a better check to predict.gam.
best,
Simon
ps. if you want predictions with the random effects for site set to zero
then one trick is to use terms like s(site,bs="re",by=dum) in fitting
with dum set to 1. Then in prediction you can set 'site' to any existing
level, and dum to zero, in order to get a prediction for the missing
level, with the 'site' effect set to zero.
On 02/02/14 17:52, Katharina May wrote:
> Hi Simon,
>
> thank you for your reply, I really appreciate any help to understand
> the problem here...
> Unluckily the package upgrade didn't help with this issue.
> An example reproducing the error, and a current sessionInfo() Output
> can be found below.
>
> Many thanks once again,
>
> Katharina
>
>
> R Code Example
> <snip>
> library(RCurl)
> library(mgcv)
> #retrieve xylemRohWeekXnn2011 test data frame
> eval( expr = parse( text =
> getURL("https://webdisk.ads.mwn.de/Handlers/AnonymousDownload.ashx?folder=1a7cbaa4&path=xylemRohWeekXnn2011.R")
> ))
>
> xylemRohWeekXnn.fit.bam <- bam(sensor1 ~ sensor2 + s(site, bs="re")
> + s(site, NthSampling, bs="re") , data=xylemRohWeekXnn2011,
> na.action=na.omit)
>
> #subset data containing gaps for predicting
> gapData <- xylemRohWeekXnn2011[is.na(xylemRohWeekXnn2011[,2]) &
> !is.na(xylemRohWeekXnn2011[,11]),c(2:3,6:7, 11)]
>
> xylemRohWeekXnnSite.fit <-
> predict.gam(xylemRohWeekXnn.fit.bam,gapData, type="response", se=F)
> </snap>
>
>
>
> My current Session Information (sessionInfo() Output - also confirming
> that the problem exists on both Windows and Mac OS X):
> <snip>
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] mgcv_1.7-28 nlme_3.1-113 RCurl_1.95-4.1 bitops_1.0-6
>
> loaded via a namespace (and not attached):
> [1] grid_3.0.2 lattice_0.20-24 Matrix_1.1-2 tools_3.0.2
> </snap>
>
>
>
>
> On 31/01/14 12:57, Simon Wood wrote:
>>
>> Hi Katharina,
>>
>> Could you try upgrading to mgcv_1.7-28, please? There was an occasional
>> problem to do with matching factor levels, which is fixed, but I'm not
>> very confident that is what is going on.
>>
>> If upgrading doesn't work, is there any chance you could send me a small
>> example dataset and code that produces the error, and I'll look at it?
>>
>> best,
>> Simon
>>
>> --
>> Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
>> +44 (0)1225 386603 http://people.bath.ac.uk/sw283
--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603 http://people.bath.ac.uk/sw283
More information about the R-help
mailing list