[R-sig-Geo] new function predict, package raster

Robert J. Hijmans r.hijmans at gmail.com
Thu Dec 23 18:53:56 CET 2010


Isabelle, I think you nailed it. The function now sets the levels and
things seem to work well (version 1-7.19). Thanks! Robert

On Thu, Dec 23, 2010 at 3:54 AM, isabelle boulangeat
<isabelle.boulangeat at gmail.com> wrote:
> Dear Robert,
>
> I think the problem when using random Forest is that the variable "factor"
> is converted into factor inside the predict function and some levels are
> lost  if the levels are not all in the blockvals selection.
> Do you think it is possible to overcome this problem?
>
> Cheers,
>
> Isabelle.
>
> 2010/12/13 isabelle boulangeat <isabelle.boulangeat at gmail.com>
>>
>> Dear Robert,
>>
>> Today, with the version 1.7-12, there is again a problem with the function
>> "predict" and randomForest.
>> It works perfecly when I remove the categorial variable, but I got this
>> error when I add the categorial variable:
>>
>> Erreur dans predict.randomForest(model, blockvals, ...) :
>>   Type of predictors in new data do not match that of the training data.
>>
>> I have a similar error using the package party:
>>
>> Erreur dans checkData(oldData, RET) :
>>   Classes of new data do not match original data
>>
>> details:
>>
>> Pred_ras <- predict(regionalStack, model=rf2PA,  type="prob",  index=2)
>>
>> layerNames(regionalStack)
>> [1] "Mtc"        "prcpwin"    "mind678"    "sradyy"     "ru"
>> "carbonSoil"
>> [7] "clc"
>>
>> rf2PA <- randomForest(as.factor(resp) ~ ., data=dat)
>>
>> > str(dat)
>> 'data.frame':   14029 obs. of  8 variables:
>>  $ resp      : num  1 0 0 0 1 0 0 0 0 0 ...
>>  $ Mtc       : num  -194.4 -282.9 -168 -176.5 -53.3 ...
>>  $ prcpwin   : num  478 603 720 724 572 ...
>>  $ mind678   : num  -655 -627 -780 -770 -643 ...
>>  $ sradyy    : num  17219 19636 20471 20318 14564 ...
>>  $ ru        : num  89 89 75 75 74.4 ...
>>  $ carbonSoil: num  50 50 50 50 50 50 50 50 50 50 ...
>>  $ clc       : Factor w/ 7 levels "1","2","3","4",..: 5
>>
>>
>> unique(dat$clc)
>> [1] 3 5 4 6 7 1 2
>> Levels: 1 2 3 4 5 6 7
>>
>>  unique(subset(regionalStack, 7))
>> [1] 1 2 3 4 5 6 7
>>
>> Any idea?
>>
>> 2010/12/8 Robert J. Hijmans <r.hijmans at gmail.com>
>>>
>>> Isabelle, I think I fixed this in raster 1.7-11. Robert
>>>
>>> On Wed, Dec 8, 2010 at 8:55 AM, isabelle boulangeat
>>> <isabelle.boulangeat at gmail.com> wrote:
>>> > Dear Robert,
>>> >
>>> > I am afraid this doesn't work with a "randomForest" model type (package
>>> > randomForest), although there is no problem without categorical
>>> > variables.
>>> >
>>> > To be continued...
>>> >
>>> > Isabelle.
>>> >
>>> > 2010/12/6 isabelle boulangeat <isabelle.boulangeat at gmail.com>
>>> >>
>>> >> Dear Robert,
>>> >>
>>> >> I tested today my script with the package 1.7-9 and it works!
>>> >>
>>> >> Pred_fac <- predict(rasStack, model=model_fac, predict = predict.glm,
>>> >> type="response", progress='text')
>>> >>
>>> >> Thanks a lot.
>>> >>
>>> >> Isabelle.
>>> >>
>>> >> 2010/12/3 Robert J. Hijmans <r.hijmans at gmail.com>
>>> >>>
>>> >>> Dear Isabelle,
>>> >>>
>>> >>> Thanks for the detailed report. So it is a bug after all. This is
>>> >>> probably fixed in raster 1.7-8 (on CRAN for win and lin now).
>>> >>> At least the below works for me (model with 2 variables, one is a
>>> >>> factor).
>>> >>>
>>> >>> Robert
>>> >>>
>>> >>> library(raster)
>>> >>> # create a RasterStack (a set of predictor rasters)
>>> >>> logo <- stack(system.file("external/rlogo.grd", package="raster"))
>>> >>> presence <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84,
>>> >>> 95, 85, 66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38,
>>> >>> 31, 22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2)
>>> >>> background <- cbind(runif(250)*(xmax(logo)-xmin(logo))+xmin(logo),
>>> >>> runif(250)*(ymax(logo)-ymin(logo))+ymin(logo))
>>> >>>
>>> >>> xy <- rbind(cbind(1, presence), cbind(0, background))
>>> >>> v <- cbind(xy[,1], extract(logo, xy[,2:3]))
>>> >>> colnames(v)[1] <- 'presback'
>>> >>> v=data.frame(v[,1:3])
>>> >>> v[,3] <- as.factor(v[,3])
>>> >>>
>>> >>> model <- glm(formula=presback~., data=v)
>>> >>> r <- predict(logo, model, progress='text')
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Fri, Dec 3, 2010 at 3:26 AM, isabelle boulangeat
>>> >>> <isabelle.boulangeat at gmail.com> wrote:
>>> >>> > Hi,
>>> >>> >
>>> >>> > Thanks for all these explanations. I misunderstood the use of the
>>> >>> > "const".
>>> >>> > In my case, I just need to deal with an explicative variable that
>>> >>> > is
>>> >>> > spatially explicit but categorial.
>>> >>> > I should use a raster with numbers that correspond to my factor's
>>> >>> > levels.
>>> >>> > However, I still have some problems to predict using the categorial
>>> >>> > variable
>>> >>> > (raster package, version 1.7-7)
>>> >>> >
>>> >>> > my code : predict(rasStack, model=model_fac)
>>> >>> > Error message : Erreur dans v[cells, ] <- predv :
>>> >>> >
>>> >>> > Here are details on the raster Stack and the data.frame containing
>>> >>> > explicative variables to fit the model (expl).
>>> >>> >
>>> >>> > rasStack
>>> >>> > class       : RasterStack
>>> >>> > nlayers     : 2
>>> >>> > nrow        : 3137
>>> >>> > ncol        : 1951
>>> >>> > ncell       : 6120287
>>> >>> > projection  : +proj=lcc +lat_1=45.89891888888889
>>> >>> > +lat_2=47.69601444444444
>>> >>> > +lat_0=46.8 +lon_0=2.337229166666667 +x_0=600000 +y_0=2200000
>>> >>> > +ellps=WGS84
>>> >>> > +datum=WGS84 +units=m +no_defs +towgs84=0,0,0
>>> >>> > min value   : -1280     1
>>> >>> > max value   : 462   7
>>> >>> > extent      : 784300, 979400, 1858800, 2172500  (xmin, xmax, ymin,
>>> >>> > ymax)
>>> >>> > resolution  : 100, 100  (x, y)
>>> >>> >
>>> >>> > layerNames(rasStack)
>>> >>> > [1] "temp"   "corine"
>>> >>> >
>>> >>> > unique(getValues(rasStack[[2]]))
>>> >>> > [1] NA  6  5  1  7  4  2  3
>>> >>> >
>>> >>> > str(expl)
>>> >>> > 'data.frame':   11396 obs. of  2 variables:
>>> >>> >  $ temp  : num  -78.4 -282.9 -172.1 -173.6 -83.2 ...
>>> >>> >  $ corine: Factor w/ 7 levels "1","2","3","4",..: 6 4 3 4 5 6 6 4 5
>>> >>> > 6
>>> >>> > ...
>>> >>> >
>>> >>> > unique(expl$corine)
>>> >>> > [1] 6 4 3 5 7 2 1
>>> >>> > Levels: 1 2 3 4 5 6 7
>>> >>> >
>>> >>> > model_fac <- glm(resp ~ ., data=expl, family="binomial")
>>> >>> >
>>> >>> >
>>> >>> > Any idea?
>>> >>> >
>>> >>> > 2010/12/2 Robert J. Hijmans <r.hijmans at gmail.com>
>>> >>> >>
>>> >>> >> Isabelle,
>>> >>> >>
>>> >>> >> If you use a factor that is represented by numbers on a grid you
>>> >>> >> can
>>> >>> >> just fit the model (with the factor variable as factor), and the
>>> >>> >> predict function will then make a factor from the predictor layers
>>> >>> >> (i.e. there is nothing else you need to do). Where this can go
>>> >>> >> wrong
>>> >>> >> is if you have factor levels on your raster that were not in the
>>> >>> >> data
>>> >>> >> used to fit the model. You can get something like:
>>> >>> >> Error in model.frame.default(Terms, newdata, na.action =
>>> >>> >> na.action,
>>> >>> >> xlev = object$xlevels) :
>>> >>> >>  factor 'x' has new level(s):  2, 3, 4, 5,
>>> >>> >> Which means that you first need to remove those levels (set to NA)
>>> >>> >> from your predictor raster. (I will try to automate that problem
>>> >>> >> away,
>>> >>> >> in raster::predict, eventually)
>>> >>> >>
>>> >>> >> The argument "const" is for factor type variables for which you do
>>> >>> >> not
>>> >>> >> have spatial data. Hence they could be considered 'constant' in
>>> >>> >> some
>>> >>> >> contexts. For example, with fish observation data you might have a
>>> >>> >> model that predicts abundance from a number of variables including
>>> >>> >> the
>>> >>> >> method used to catch the fish. Say a factor variable "method" with
>>> >>> >> levels "A" or "B". You can then do something like
>>> >>> >> method = factor("A")
>>> >>> >> predict(raster, model, const=method)
>>> >>> >>
>>> >>> >> Robert
>>> >>> >>
>>> >>> >> On Thu, Dec 2, 2010 at 9:02 AM, isabelle boulangeat
>>> >>> >> <isabelle.boulangeat at gmail.com> wrote:
>>> >>> >> > Hello,
>>> >>> >> >
>>> >>> >> > I am trying to use the recently updated function "predict" from
>>> >>> >> > the
>>> >>> >> > package
>>> >>> >> > raster. I would like to use the field "const", as one of my
>>> >>> >> > variable
>>> >>> >> > is
>>> >>> >> > categorial.
>>> >>> >> > I have an error message but I can't find where is the problem.
>>> >>> >> >
>>> >>> >> > Error message : Erreur dans `[.data.frame`(blockvals, , f[i]) :
>>> >>> >> > undefined
>>> >>> >> > columns selected
>>> >>> >> >
>>> >>> >> > code : Pred_fac <- predict(rasStack, model=model_fac,
>>> >>> >> > predict=predict.glm,
>>> >>> >> > progress='text', const=factorDF)
>>> >>> >> >
>>> >>> >> > rasStack is a RasterStack and contains only one layer with one
>>> >>> >> > explicative
>>> >>> >> > variable. the layer name is the same as in the model.
>>> >>> >> > factorDF is a data.frame and contains only one column with one
>>> >>> >> > explicative
>>> >>> >> > variable. the column name is the same as in the model.
>>> >>> >> >
>>> >>> >> > the dimensions are the same : the number of rows in the data
>>> >>> >> > frame
>>> >>> >> > correspond to the number of cells in the raster.
>>> >>> >> >
>>> >>> >> > Does someone succeed to use this function with the new field
>>> >>> >> > "const"
>>> >>> >> > ?
>>> >>> >> >
>>> >>> >> > Thanks for help,
>>> >>> >> >
>>> >>> >> > Cheers,
>>> >>> >> >
>>> >>> >> > Isabelle.
>>> >>> >> >
>>> >>> >> > --
>>> >>> >> > Isabelle Boulangeat, PhD student
>>> >>> >> > Ecosystem dynamics and plant traits
>>> >>> >> >
>>> >>> >> > Bureau 203, +33 (0) 476 635 733
>>> >>> >> >
>>> >>> >> > Laboratoire d'écologie Alpine (LECA)
>>> >>> >> > Grenoble, France
>>> >>> >> > lab' s website : http://www-leca.ujf-grenoble.fr/
>>> >>> >> > my website : http://j.boulangeat.free.fr/
>>> >>> >> >
>>> >>> >> >        [[alternative HTML version deleted]]
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > _______________________________________________
>>> >>> >> > R-sig-Geo mailing list
>>> >>> >> > R-sig-Geo at r-project.org
>>> >>> >> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>> >>> >> >
>>> >>> >> >
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > --
>>> >>> > Isabelle Boulangeat, PhD student
>>> >>> > Ecosystem dynamics and plant traits
>>> >>> >
>>> >>> > Bureau 203, +33 (0) 476 635 733
>>> >>> >
>>> >>> > Laboratoire d'écologie Alpine (LECA)
>>> >>> > Grenoble, France
>>> >>> > lab' s website : http://www-leca.ujf-grenoble.fr/
>>> >>> > my website : http://j.boulangeat.free.fr/
>>> >>> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Isabelle Boulangeat, PhD student
>>> >> Ecosystem dynamics and plant traits
>>> >>
>>> >> Bureau 203, +33 (0) 476 635 733
>>> >>
>>> >> Laboratoire d'écologie Alpine (LECA)
>>> >> Grenoble, France
>>> >> lab' s website : http://www-leca.ujf-grenoble.fr/
>>> >> my website : http://j.boulangeat.free.fr/
>>> >
>>> >
>>> >
>>> > --
>>> > Isabelle Boulangeat, PhD student
>>> > Ecosystem dynamics and plant traits
>>> >
>>> > Bureau 203, +33 (0) 476 635 733
>>> >
>>> > Laboratoire d'écologie Alpine (LECA)
>>> > Grenoble, France
>>> > lab' s website : http://www-leca.ujf-grenoble.fr/
>>> > my website : http://j.boulangeat.free.fr/
>>> >
>>
>>
>>
>> --
>> Isabelle Boulangeat, PhD student
>> Ecosystem dynamics and plant traits
>>
>> Bureau 203, +33 (0) 476 635 733
>>
>> Laboratoire d'écologie Alpine (LECA)
>> Grenoble, France
>> lab' s website : http://www-leca.ujf-grenoble.fr/
>> my website : http://j.boulangeat.free.fr/
>
>
>
> --
> Isabelle Boulangeat, PhD student
> Ecosystem dynamics and plant traits
>
> Bureau 203, +33 (0) 476 635 733
>
> Laboratoire d'écologie Alpine (LECA)
> Grenoble, France
> lab' s website : http://www-leca.ujf-grenoble.fr/
> my website : http://j.boulangeat.free.fr/
>



More information about the R-sig-Geo mailing list