[R] problem with certain data sets when using randomForest
Martin Lam
tmlammail at yahoo.com
Fri Aug 26 20:22:03 CEST 2005
Thank you for this and earlier help Mr. Ripley.
Martin
--- Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> Look at ?"[.factor":
>
> finaldataset$Species <-
> finaldataset$Species[,drop=TRUE]
>
> solves this.
>
> On Fri, 26 Aug 2005, Martin Lam wrote:
>
> > Hi,
> >
> > Since I've had no replies on my previous post
> about my
> > problem I am posting it again in the hope someone
> > notice it. The problem is that the randomForest
> > function doesn't take datasets which has instances
> > only containing a subset of all the classes. So
> the
> > dataset with instances that either belong to class
> "a"
> > or "b" from the levels "a", "b" and "c" doesn't
> work
> > because there is no instance that has class "c".
> Is
> > there any way to solve this problem?
> >
> > library("randomForest")
> >
> > # load the iris plant data set
> > dataset <- iris
> >
> > numberarray <- array(1:nrow(dataset),
> nrow(dataset),
> > 1)
> >
> > # include only instances with Species = setosa or
> > virginica
> > indices <- t(numberarray[(dataset$Species ==
> "setosa"
> > |
> > dataset$Species == "virginica") == TRUE])
> >
> > finaldataset <- dataset[indices,]
> >
> > # just to let you see the 3 classes
> > levels(finaldataset$Species)
> >
> > # create the random forest
> > randomForest(formula = Species ~ ., data =
> > finaldataset, ntree = 5)
> >
> > # The error message I get
> > Error in randomForest.default(m, y, ...) :
> > Can't have empty classes in y.
> >
> > #The problem is that the finaldataset doesn't
> contain
> > #any instances of "versicolor", so I think the
> only
> > way #to solve this problem is by changing the
> levels
> > the #"Species" have to only "setosa" and
> "virginica",
> > # correct me if I'm wrong.
> >
> > # So I tried to change the levels but I got stuck:
> >
> > # get the possible unique classes
> > uniqueItems <-
> unique(levels(finaldataset$Species))
> >
> > # the problem!
> > newlevels <- list(uniqueItems[1] =
> c(uniqueItems[1],
> > uniqueItems[2]), uniqueItems[3] = uniqueItems[3])
> >
> > # Error message
> > Error: syntax error
> >
> > # In the help they use constant names to rename
> the
> > #levels, so this works (but that's not what I want
> > #because I don't want to change the code every
> time I
> > #use another data set):
> > newlevels <- list("setosa" = c(uniqueItems[1],
> > uniqueItems[2]), "virginica" = uniqueItems[3])
> >
> > levels(finaldataset$Species) <- newlevels
> >
> > levels(finaldataset$Species)
> >
> > finaldataset$Species
> >
> > ---------------------------
> >
> > Thanks in advance,
> >
> > Martin
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> >
>
> --
> Brian D. Ripley,
> ripley at stats.ox.ac.uk
> Professor of Applied Statistics,
> http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865
> 272861 (self)
> 1 South Parks Road, +44 1865
> 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865
> 272595
>
More information about the R-help
mailing list