[R] problem with certain data sets when using randomForest

Martin Lam tmlammail at yahoo.com
Fri Aug 26 20:22:03 CEST 2005


Thank you for this and earlier help Mr. Ripley.

Martin

--- Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:

> Look at ?"[.factor":
> 
>  	finaldataset$Species <-
> finaldataset$Species[,drop=TRUE]
> 
> solves this.
> 
> On Fri, 26 Aug 2005, Martin Lam wrote:
> 
> > Hi,
> >
> > Since I've had no replies on my previous post
> about my
> > problem I am posting it again in the hope someone
> > notice it. The problem is that the randomForest
> > function doesn't take datasets which has instances
> > only containing a subset of  all the classes. So
> the
> > dataset with instances that either belong to class
> "a"
> > or "b" from the levels "a", "b" and "c" doesn't
> work
> > because there is no instance that has class "c".
> Is
> > there any way to solve this problem?
> >
> > library("randomForest")
> >
> > # load the iris plant data set
> > dataset <- iris
> >
> > numberarray <- array(1:nrow(dataset),
> nrow(dataset),
> > 1)
> >
> > # include only instances with Species = setosa or
> > virginica
> > indices <- t(numberarray[(dataset$Species ==
> "setosa"
> > |
> > dataset$Species == "virginica") == TRUE])
> >
> > finaldataset <- dataset[indices,]
> >
> > # just to let you see the 3 classes
> > levels(finaldataset$Species)
> >
> > # create the random forest
> > randomForest(formula = Species ~ ., data =
> > finaldataset, ntree = 5)
> >
> > # The error message I get
> > Error in randomForest.default(m, y, ...) :
> >        Can't have empty classes in y.
> >
> > #The problem is that the finaldataset doesn't
> contain
> > #any instances of "versicolor", so I think the
> only
> > way #to solve this problem is by changing the
> levels
> > the #"Species" have to only "setosa" and
> "virginica",
> > # correct me if I'm wrong.
> >
> > # So I tried to change the levels but I got stuck:
> >
> > # get the possible unique classes
> > uniqueItems <-
> unique(levels(finaldataset$Species))
> >
> > # the problem!
> > newlevels <- list(uniqueItems[1] =
> c(uniqueItems[1],
> > uniqueItems[2]), uniqueItems[3] = uniqueItems[3])
> >
> > # Error message
> > Error: syntax error
> >
> > # In the help they use constant names to rename
> the
> > #levels, so this works (but that's not what I want
> > #because I don't want to change the code every
> time I
> > #use another data set):
> > newlevels <- list("setosa" = c(uniqueItems[1],
> > uniqueItems[2]), "virginica" = uniqueItems[3])
> >
> > levels(finaldataset$Species) <- newlevels
> >
> > levels(finaldataset$Species)
> >
> > finaldataset$Species
> >
> > ---------------------------
> >
> > Thanks in advance,
> >
> > Martin
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> >
> 
> -- 
> Brian D. Ripley,                 
> ripley at stats.ox.ac.uk
> Professor of Applied Statistics, 
> http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865
> 272861 (self)
> 1 South Parks Road,                     +44 1865
> 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865
> 272595
>




More information about the R-help mailing list