[R] help with the usage of "randomForest"
Hui Han
hhan at cse.psu.edu
Wed Mar 31 19:11:12 CEST 2004
Thinking that the following suggestions by Matt may be helpful to others,
I am fowarding his notes to R-list.
Regards,
Hui
On Wed, Mar 31, 2004 at 08:57:13AM -0800, Austin, Matt wrote:
> Use na.action=na.omit in your function call to delete those rows, but this
> can give you problems if you want to use follow-up methods such as the
> partial.plot(). This is what I usually do:
>
> naRows <- apply(data2, 1, function(x) any(is.na(x)))
>
> sum(!(naRows))
>
> data2.noNAs <- data2[!naRows,]
>
> chg.rf <- randomForest(ch13 ~ .,data=data2.noNAs, importance=TRUE,
> keep.forest=TRUE)
>
>
> That way when I call partial.plot() like in the following example I don't
> run into trouble with NAs in the original dataset not matching with what was
> used in the random forest fit.
>
>
> postscript("temp.ps", horizontal=TRUE)
> par(mfrow=c(4,4))
> for(i in 1:length(varNames)){
> partial.plot(chg.f, data2.noNAs, varNames[i], ylim=c(.95, 1.7))
> }
> dev.off()
>
>
> -----Original Message-----
> From: Hui Han [mailto:hhan at cse.psu.edu]
> Sent: Wednesday, March 31, 2004 8:12 AM
> To: Austin, Matt
> Subject: Re: [R] help with the usage of "randomForest"
>
>
> Matt,
>
> I appreciate your help so much!! Yes, I changed all NAs to real values, and
> the error msg. disappeared.
> However my real dataset contains many NAs. Can you give me more suggestions
> on how to define na.action not be na.fail?
>
> Thank you so much again,
>
> Hui
>
> On Wed, Mar 31, 2004 at 08:02:47AM -0800, Austin, Matt wrote:
> > What is yy? Is this your subset index? If so make sure that you are not
> > removing all of one class. Note that the default na.action in
> randomForest
> > is na.fail, so even if your subsetting isn't removing all of the rows with
> > an NA the method should still fail.
> >
> > --Matt
> >
> > -----Original Message-----
> > From: r-help-bounces at stat.math.ethz.ch
> > [mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Hui Han
> > Sent: Wednesday, March 31, 2004 6:11 AM
> > To: r-help at stat.math.ethz.ch
> > Subject: [R] help with the usage of "randomForest"
> >
> >
> > Dear all,
> >
> > Can anybody give me some hint on the following error msg I got with using
> > randomForest?
> >
> > I have two-class classification problem. The data file "sample" is:
> > ----------------------------------------------------------
> > udomain.edu udomain.hcs hpclass
> > 1 1.0000 1 not
> > 2 NA 2 not
> > 3 NA 0.8 not
> > 4 NA 0.2 hp
> > 5 NA 0.9 hp
> > ------------------------------------------------------------
> > The steps I called the function are:
> > (1) Read data
> > hp <- read.table("sample")
> > (2) Call randomForest
> > hp.rf <- randomForest(hpclass ~., yy, data=hp, importance=TRUE,
> > proximity=TRUE)
> >
> > But the error msg I got is:
> > Error in randomForest.default(m, y, ...) :
> > Need at least two classes to do classification.
> >
> >
> > I learned the usage of randomForest from:
> >
> http://www.maths.lth.se/help/R/.R/library/randomForest/html/randomForest.htm
> > l
> >
> > Thanks a lot for any of your comments in advance!
> >
> >
> > Hui Han
> > Department of Computer Science and Engineering,
> > The Pennsylvania State University
> > University Park, PA,16802
> > email: hhan at cse.psu.edu
> > homepage: http://www.cse.psu.edu/~hhan
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
>
>
> Hui Han
> Department of Computer Science and Engineering,
> The Pennsylvania State University
> University Park, PA,16802
> email: hhan at cse.psu.edu
> homepage: http://www.cse.psu.edu/~hhan
Hui Han
Department of Computer Science and Engineering,
The Pennsylvania State University
University Park, PA,16802
email: hhan at cse.psu.edu
homepage: http://www.cse.psu.edu/~hhan
More information about the R-help
mailing list