[R] randomForest

Duncan Murdoch murdoch at stats.uwo.ca
Thu Jul 7 22:13:36 CEST 2005


On 7/7/2005 3:47 PM, Weiwei Shi wrote:
> it works.
> thanks,
> 
> but: (just curious)
> why i tried previously and i got
> 
>> is.vector(sample.size)
> [1] TRUE
> 
> i also tried as.vector(sample.size) and assigned it to sampsz,it still
> does not work.

Sorry, I used "vector" incorrectly.  Lists are vectors.  What sum needs 
is a numeric or complex vector, and lists are vectors of objects, not 
vectors of numbers.

You should use is.numeric(sample.size) to test whether you can sum 
sample.size.

Duncan Murdoch

> 
> On 7/7/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>> On 7/7/2005 3:38 PM, Weiwei Shi wrote:
>> > Hi there:
>> > I have a question on random foresst:
>> >
>> > recently i helped a friend with her random forest and i came with this problem:
>> > her dataset has 6 classes and since the sample size is pretty small:
>> > 264 and the class distr is like this (Diag is the response variable)
>> > sample.size <- lapply(1:6, function(i) sum(Diag==i))
>> >> sample.size
>> > [[1]]
>> > [1] 36
>> >
>> > [[2]]
>> > [1] 12
>> >
>> > [[3]]
>> > [1] 120
>> >
>> > [[4]]
>> > [1] 36
>> >
>> > [[5]]
>> > [1] 30
>> >
>> > [[6]]
>> > [1] 30
>> >
>> > I assigned this sample.size to sampsz for a stratiefied sampling
>> > purpose and i got the following error:
>> > Error in sum(..., na.rm = na.rm) : invalid 'mode' of argument
>> >
>> > if I use sampsz=c(36, 12, 120, 36, 30, 30), then it is fine. Could you
>> > tell me why?
>> 
>> The sum() function knows what to do on a vector, but not on a list.  You
>> can turn your sample.size variable into a vector using
>> 
>> unlist(sample.size)
>> 
>> Duncan Murdoch
>> 
>> > btw, as to classification problem for this with uneven class number
>> > situation, do u have some suggestions to improve its accuracy?  I
>> > tried to use c() way to make the sampsz works but the result is
>> > similar.
>> >
>> > Thanks,
>> >
>> > weiwei
>> >
>> > On 6/30/05, Liaw, Andy <andy_liaw at merck.com> wrote:
>> >> The limitation comes from the way categorical splits are represented in the
>> >> code:  For a categorical variable with k categories, the split is
>> >> represented by k binary digits: 0=right, 1=left.  So it takes k bits to
>> >> store each split on k categories.  To save storage, this is `packed' into a
>> >> 4-byte integer (32-bit), thus the limit of 32 categories.
>> >>
>> >> The current Fortran code (version 5.x) by Breiman and Cutler gets around
>> >> this limitation by storing the split in an integer array.  While this lifts
>> >> the 32-category limit, it takes much more memory to store the splits.  I'm
>> >> still trying to figure out a more memory efficient way of storing the splits
>> >> without imposing the 32-category limit.  If anyone has suggestions, I'm all
>> >> ears.
>> >>
>> >> Best,
>> >> Andy
>> >>
>> >> > From: Arne.Muller at sanofi-aventis.com
>> >> >
>> >> > Hello,
>> >> >
>> >> > I'm using the random forest package. One of my factors in the
>> >> > data set contains 41 levels (I can't code this as a numeric
>> >> > value - in terms of linear models this would be a random
>> >> > factor). The randomForest call comes back with an error
>> >> > telling me that the limit is 32 categories.
>> >> >
>> >> > Is there any reason for this particular limit? Maybe it's
>> >> > possible to recompile the module with a different cutoff?
>> >> >
>> >> >       thanks a  lot for your help,
>> >> >       kind regards,
>> >> >
>> >> >
>> >> >       Arne
>> >> >
>> >> > ______________________________________________
>> >> > R-help at stat.math.ethz.ch mailing list
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide!
>> >> > http://www.R-project.org/posting-guide.html
>> >> >
>> >> >
>> >> >
>> >>
>> >> ______________________________________________
>> >> R-help at stat.math.ethz.ch mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>> >>
>> >
>> >
>> 
>> 
> 
>




More information about the R-help mailing list