[R] problem with certain data sets when using randomForest
Martin Lam
tmlammail at yahoo.com
Fri Aug 26 17:52:21 CEST 2005
Hi,
Since I've had no replies on my previous post about my
problem I am posting it again in the hope someone
notice it. The problem is that the randomForest
function doesn't take datasets which has instances
only containing a subset of all the classes. So the
dataset with instances that either belong to class "a"
or "b" from the levels "a", "b" and "c" doesn't work
because there is no instance that has class "c". Is
there any way to solve this problem?
library("randomForest")
# load the iris plant data set
dataset <- iris
numberarray <- array(1:nrow(dataset), nrow(dataset),
1)
# include only instances with Species = setosa or
virginica
indices <- t(numberarray[(dataset$Species == "setosa"
|
dataset$Species == "virginica") == TRUE])
finaldataset <- dataset[indices,]
# just to let you see the 3 classes
levels(finaldataset$Species)
# create the random forest
randomForest(formula = Species ~ ., data =
finaldataset, ntree = 5)
# The error message I get
Error in randomForest.default(m, y, ...) :
Can't have empty classes in y.
#The problem is that the finaldataset doesn't contain
#any instances of "versicolor", so I think the only
way #to solve this problem is by changing the levels
the #"Species" have to only "setosa" and "virginica",
# correct me if I'm wrong.
# So I tried to change the levels but I got stuck:
# get the possible unique classes
uniqueItems <- unique(levels(finaldataset$Species))
# the problem!
newlevels <- list(uniqueItems[1] = c(uniqueItems[1],
uniqueItems[2]), uniqueItems[3] = uniqueItems[3])
# Error message
Error: syntax error
# In the help they use constant names to rename the
#levels, so this works (but that's not what I want
#because I don't want to change the code every time I
#use another data set):
newlevels <- list("setosa" = c(uniqueItems[1],
uniqueItems[2]), "virginica" = uniqueItems[3])
levels(finaldataset$Species) <- newlevels
levels(finaldataset$Species)
finaldataset$Species
---------------------------
Thanks in advance,
Martin
More information about the R-help
mailing list