[R-sig-Geo] problem with predict() in package raster and factor variables

Gonzalez-Mirelis Genoveva genoveva.gonzalez-mirelis at imr.no
Sat Apr 30 12:32:54 CEST 2016


Dear list,
I am trying to use the function predict() (in package raster), where I supply: the new data as a RasterBrick, the model (as fit in previous steps and using a different dataset), and a few more arguments including the levels of my only one categorical value. Here is the code I'm using:

 r1 <- predict(subbrick,
              CIF.pa,
              type="response", OOB=T, factors=f)

But I keep getting the following error:

Error in checkData(oldData, RET) :

  Classes of new data do not match original data

Here are more details:

> CIF.pa

         Random Forest using Conditional Inference Trees

Number of trees:  1000

Response:  PA
Inputs:  bathy20_1, TerClass, Smax_ann, Smean_ann, Smin_ann, SPDmax_ann, SPDmean_ann, Tmax_ann, Tmean_ann, Tmin_ann
Number of observations:  986

Where 'TerClass' is a categorical variable.

Here is the data used to train CIF.pa:


> str(v)

'data.frame':   1257 obs. of  15 variables:

 $ RefNo      : int  16 16 16 16 17 17 17 17 18 18 ...

 $ PointID    : int  1 2 3 4 5 6 7 8 9 10 ...

 $ Count      : int  0 0 0 0 0 0 0 0 0 0 ...

 $ PA         : int  0 0 0 0 0 0 0 0 0 0 ...

 $ split      : chr  "T" "T" "T" "T" ...

 $ bathy20_1  : num  256 260 252 266 281 ...

 $ TerClass   : num  2 2 1 1 1 2 1 1 3 3 ...

 $ Smax_ann   : num  35.1 35.1 35.1 35.1 35.1 ...

 $ Smean_ann  : num  35.1 35.1 35.1 35.1 35.1 ...

 $ Smin_ann   : num  34.9 34.9 34.9 34.9 35 ...

 $ SPDmax_ann : num  0.379 0.376 0.378 0.372 0.352 ...

 $ SPDmean_ann: num  0.14 0.137 0.14 0.132 0.12 ...

 $ Tmax_ann   : num  6.97 6.92 7.04 6.87 6.68 ...

 $ Tmean_ann  : num  5.76 5.73 5.79 5.71 5.54 ...

 $ Tmin_ann   : num  4.41 4.32 4.52 4.25 4.07 ...



But actually, I used a subset of v to train the model, that where v$split=='T'

Below are the values and class for TerClass for that subset



> unique(v[v$split=='T',7])

[1] 2 1 3 4 6 5

> class(v$TerClass)

[1] "numeric"

And below are the values and class for the corresponding layer of the RasterBrick:

> unique(values(subbrick$TerClass))

[1] 3 1 2 4 5 6

> class(values(subbrick$TerClass))

[1] "numeric"

And finally, here is what f looks like:


> f
$TerClass
[1] 2 1 3 4 6 5

> class(f)
[1] "list"


As far as I can see the classes in OldData and NewData should be the same, but the error persists. Any ideas on what I could be missing?

Unfortunately I am unable to reproduce the problem (I only encounter it when using my data), but any help will be hugely appreciated

Also, I am aware that I asked this question before (Apr 04, 2013; 1:22pm). Unfortunately I haven't gotten very far since then!

Many thanks in advance for any pointers.

Genoveva



More information about the R-sig-Geo mailing list