[R] randomForest

Liaw, Andy andy_liaw at merck.com
Thu Jul 7 22:10:32 CEST 2005


> From: Weiwei Shi
> 
> it works.
> thanks,
> 
> but: (just curious)
> why i tried previously and i got
> 
> > is.vector(sample.size)
> [1] TRUE

Because a list is also a vector:

> a <- c(list(1), list(2))
> a
[[1]]
[1] 1

[[2]]
[1] 2

> is.vector(a)
[1] TRUE
> is.numeric(a)
[1] FALSE

Actually, the way I initialize a list of known length is by something like:

myList <- vector(mode="list", length=veryLong)

Andy
 
 
> i also tried as.vector(sample.size) and assigned it to sampsz,it still
> does not work.
> 
> On 7/7/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> > On 7/7/2005 3:38 PM, Weiwei Shi wrote:
> > > Hi there:
> > > I have a question on random foresst:
> > >
> > > recently i helped a friend with her random forest and i 
> came with this problem:
> > > her dataset has 6 classes and since the sample size is 
> pretty small:
> > > 264 and the class distr is like this (Diag is the 
> response variable)
> > > sample.size <- lapply(1:6, function(i) sum(Diag==i))
> > >> sample.size
> > > [[1]]
> > > [1] 36
> > >
> > > [[2]]
> > > [1] 12
> > >
> > > [[3]]
> > > [1] 120
> > >
> > > [[4]]
> > > [1] 36
> > >
> > > [[5]]
> > > [1] 30
> > >
> > > [[6]]
> > > [1] 30
> > >
> > > I assigned this sample.size to sampsz for a stratiefied sampling
> > > purpose and i got the following error:
> > > Error in sum(..., na.rm = na.rm) : invalid 'mode' of argument
> > >
> > > if I use sampsz=c(36, 12, 120, 36, 30, 30), then it is 
> fine. Could you
> > > tell me why?
> > 
> > The sum() function knows what to do on a vector, but not on 
> a list.  You
> > can turn your sample.size variable into a vector using
> > 
> > unlist(sample.size)
> > 
> > Duncan Murdoch
> > 
> > > btw, as to classification problem for this with uneven 
> class number
> > > situation, do u have some suggestions to improve its accuracy?  I
> > > tried to use c() way to make the sampsz works but the result is
> > > similar.
> > >
> > > Thanks,
> > >
> > > weiwei
> > >
> > > On 6/30/05, Liaw, Andy <andy_liaw at merck.com> wrote:
> > >> The limitation comes from the way categorical splits are 
> represented in the
> > >> code:  For a categorical variable with k categories, the split is
> > >> represented by k binary digits: 0=right, 1=left.  So it 
> takes k bits to
> > >> store each split on k categories.  To save storage, this 
> is `packed' into a
> > >> 4-byte integer (32-bit), thus the limit of 32 categories.
> > >>
> > >> The current Fortran code (version 5.x) by Breiman and 
> Cutler gets around
> > >> this limitation by storing the split in an integer 
> array.  While this lifts
> > >> the 32-category limit, it takes much more memory to 
> store the splits.  I'm
> > >> still trying to figure out a more memory efficient way 
> of storing the splits
> > >> without imposing the 32-category limit.  If anyone has 
> suggestions, I'm all
> > >> ears.
> > >>
> > >> Best,
> > >> Andy
> > >>
> > >> > From: Arne.Muller at sanofi-aventis.com
> > >> >
> > >> > Hello,
> > >> >
> > >> > I'm using the random forest package. One of my factors in the
> > >> > data set contains 41 levels (I can't code this as a numeric
> > >> > value - in terms of linear models this would be a random
> > >> > factor). The randomForest call comes back with an error
> > >> > telling me that the limit is 32 categories.
> > >> >
> > >> > Is there any reason for this particular limit? Maybe it's
> > >> > possible to recompile the module with a different cutoff?
> > >> >
> > >> >       thanks a  lot for your help,
> > >> >       kind regards,
> > >> >
> > >> >
> > >> >       Arne
> > >> >
> > >> > ______________________________________________
> > >> > R-help at stat.math.ethz.ch mailing list
> > >> > https://stat.ethz.ch/mailman/listinfo/r-help
> > >> > PLEASE do read the posting guide!
> > >> > http://www.R-project.org/posting-guide.html
> > >> >
> > >> >
> > >> >
> > >>
> > >> ______________________________________________
> > >> R-help at stat.math.ethz.ch mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> > >>
> > >
> > >
> > 
> > 
> 
> 
> 
> -- 
> Weiwei Shi, Ph.D
> 
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>




More information about the R-help mailing list