[R] Truncate levels to use randomForest
Martin Lam
tmlammail at yahoo.com
Fri Aug 26 10:15:35 CEST 2005
Hi,
I will explain my problem with this example:
library("randomForest")
# load the iris plant data set
dataset <- iris
numberarray <- array(1:nrow(dataset), nrow(dataset),
1)
# include only instances with Species = setosa or
virginica
indices <- t(numberarray[(dataset$Species == "setosa"
|
dataset$Species == "virginica") == TRUE])
finaldataset <- dataset[indices,]
# just to let you see the 3 classes
levels(finaldataset$Species)
# create the random forest
randomForest(formula = Species ~ ., data =
finaldataset, ntree = 5)
# The error message I get
Error in randomForest.default(m, y, ...) :
Can't have empty classes in y.
#The problem is that the finaldataset doesn't contain
#any instances of "versicolor", so I think the only
way #to solve this problem is by changing the levels
the #"Species" have to only "setosa" and "virginica",
# correct me if I'm wrong.
# So I tried to change the levels but I got stuck:
# get the possible unique classes
uniqueItems <- unique(levels(finaldataset$Species))
# the problem!
newlevels <- list(uniqueItems[1] = c(uniqueItems[1],
uniqueItems[2]), uniqueItems[3] = uniqueItems[3])
# Error message
Error: syntax error
# In the help they use constant names to rename the
#levels, so this works (but that's not what I want
#because I don't want to change the code every time I
#use another data set):
newlevels <- list("setosa" = c(uniqueItems[1],
uniqueItems[2]), "virginica" = uniqueItems[3])
levels(finaldataset$Species) <- newlevels
levels(finaldataset$Species)
finaldataset$Species
---------------------------
Thanks in advance,
Martin
More information about the R-help
mailing list