[R] problem with certain data sets when using randomForest

Martin Lam tmlammail at yahoo.com
Fri Aug 26 17:52:21 CEST 2005


Hi,

Since I've had no replies on my previous post about my
problem I am posting it again in the hope someone
notice it. The problem is that the randomForest
function doesn't take datasets which has instances
only containing a subset of  all the classes. So the
dataset with instances that either belong to class "a"
or "b" from the levels "a", "b" and "c" doesn't work
because there is no instance that has class "c". Is
there any way to solve this problem?

library("randomForest")

# load the iris plant data set
dataset <- iris

numberarray <- array(1:nrow(dataset), nrow(dataset),
1)

# include only instances with Species = setosa or
virginica
indices <- t(numberarray[(dataset$Species == "setosa"
| 
dataset$Species == "virginica") == TRUE])

finaldataset <- dataset[indices,]

# just to let you see the 3 classes
levels(finaldataset$Species)

# create the random forest
randomForest(formula = Species ~ ., data =
finaldataset, ntree = 5)

# The error message I get
Error in randomForest.default(m, y, ...) : 
        Can't have empty classes in y.

#The problem is that the finaldataset doesn't contain
#any instances of "versicolor", so I think the only
way #to solve this problem is by changing the levels
the #"Species" have to only "setosa" and "virginica",
# correct me if I'm wrong.

# So I tried to change the levels but I got stuck:

# get the possible unique classes
uniqueItems <- unique(levels(finaldataset$Species))

# the problem!
newlevels <- list(uniqueItems[1] = c(uniqueItems[1],
uniqueItems[2]), uniqueItems[3] = uniqueItems[3])

# Error message
Error: syntax error

# In the help they use constant names to rename the
#levels, so this works (but that's not what I want
#because I don't want to change the code every time I
#use another data set):
newlevels <- list("setosa" = c(uniqueItems[1],
uniqueItems[2]), "virginica" = uniqueItems[3])

levels(finaldataset$Species) <- newlevels

levels(finaldataset$Species)

finaldataset$Species

---------------------------

Thanks in advance,

Martin




More information about the R-help mailing list