# [R] grouping

David Winsemius dwinsemius at comcast.net
Tue Apr 3 15:10:53 CEST 2012

On Apr 3, 2012, at 8:47 AM, Val wrote:

> Hi all,
>
> Assume that I have the following 10 data points.
> x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>
> sort x  and get the following
>  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)

The methods below do not require a sorting step.

>
> I want to  group the sorted  data point (y)  into  equal number of
> observation per group. In this case there will be three groups.  The
> first
> two groups  will have three observation  and the third will have four
> observations
>
> group 1  = 34, 45, 46
> group 2  = 66, 78, 125
> group 3  = 193, 209, 242,297
>
> Finally I want to calculate the group mean
>
> group 1  =  42
> group 2  =  87
> group 3  =  234

I hope those weren't answers from SAS.

>
> Can anyone help me out?
>

I usually do this with Hmisc::cut2 since it has a `g = <n>` parameter
that auto-magically calls the quantile splitting criterion but this is
done in base R.

split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
include.lowest=TRUE) )
\$`[36,65.9]`
[1] 36 45 46

\$`(65.9,189]`
[1]  66  78 125

\$`(189,297]`
[1] 193 209 242 297

> lapply( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
include.lowest=TRUE) ), mean)
\$`[36,65.9]`
[1] 42.33333

\$`(65.9,189]`
[1] 89.66667

\$`(189,297]`
[1] 235.25

Or to get a table instead of a list:
> tapply( x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
include.lowest=TRUE) , mean)
[36,65.9] (65.9,189]  (189,297]
42.33333   89.66667  235.25000

> In SAS I used to do it using proc rank.

?quantile isn't equivalent to  Proc Rank but it will provide a useful
basis for splitting or tabling functions.

>
>
> Val
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help