[R-sig-Geo] Problem with predict() from raster stack containing factor

Katherine Ransom lockhart.katherine at gmail.com
Sat Oct 15 00:02:55 CEST 2016


Hi All,
I am puzzled by some predictions I am getting from the predict() function
on a raster stack containing a factor. I trained a gbm() model on 25
variables, one of which is a factor. Then, I am using the predict()
function on a raster stack created from a group of gridded ESRI .txt files,
one of which is my factor variable. The factor variable grid was created
the same way as the other variables (it is just the same value for every
cell). I am using:

hab = list.files(getwd(), pattern="txt$", full.names=FALSE)
rstack <- stack(hab)
pred <- predict(rstack, final, n.trees=final$n.trees, family="gaussian")

I have also tried setting the factors option with:
pred <- predict(rstack, final, n.trees=final$n.trees, family="gaussian",
factors=factor)
where factor is ("WaterUse2" is my factor variable):
 factor
$WaterUse2
[1] 2

I have also tried explicitly converting the specific raster layer within
the stack to a factor by doing:
rstack[[25]] <- as.factor(rstack[[25]]) # convert WaterUse2 raster to factor

The data used to train the model has 9 levels for WaterUse2. For
prediction, I want to make two separate prediction grids, one where
WaterUse2 is "2" everywhere, and one where it is "5" everywhere, I don't
care about the rest of the values since 2 and 5 dominate the data. In my
original data the levels were capital letters, e.g. "P","H", "I", but I
renamed them 1-9 in order to make the gridded layers for this variable for
prediction read in nice into R.

Without going into tons of detail, the variable seems to be throwing off my
predictions big time (much higher values than expected). I can leave it out
and get predictions much more in line with expectations. Also, the behavior
is not in line with the evidence in the partial plots for this variable.

Are there currently any known issues with using factors in predict()? Is
there something I could be doing wrong with this factor variable that would
lead to obviously incorrect predictions?

Many thanks,
Katie

-- 
--
Katherine Ransom
PhD Candidate
Hydrologic Sciences Graduate Group
UC Davis

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list