[R] mean for every quartile

David L Carlson dcarlson at tamu.edu
Mon May 16 16:07:02 CEST 2016


Do you understand that quartiles divide the data into 4 groups?

Min (group 1) 1st quartile (group 2) median (group3) 3rd quartile (group4) max

But in your case df$BR has only 4 unique values:

> table(df$BR)

256 320 384 512 
  2  74  24   2

So the first quartile is equal to the median:

> quantile(df$BR)
  0%  25%  50%  75% 100% 
 256  320  320  368  512

You need to use the argument rightmost.closed=TRUE with findInterval(). If you do not, the 5th group consists of only those values that are equal to the maximum:

> df$quant <- findInterval(df$BR, quantile(df$BR), rightmost.closed=TRUE)
> tapply(df$BR, df$quant, mean)
       1        3        4 
256.0000 320.0000 393.8462

Using values that are more variable:

> set.seed(42)
> df <- data.frame(BR=sample.int(100, 100, replace=TRUE))
> df$quant <- findInterval(df$BR, quantile(df$BR), rightmost.closed=TRUE)
> tapply(df$BR, df$quant, mean)
    1     2     3     4 
12.48 41.24 67.24 90.64

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of ch.elahe via R-help
Sent: Monday, May 16, 2016 8:46 AM
To: Michael Dewey; ulrik.stervbo at gmail.com
Cc: R-help Mailing List
Subject: Re: [R] mean for every quartile

Thnaks for your reply,

By using tapply I get this result:

    
    tapply(df$BR, findInterval(df$BR, quantile(df$BR)), mean) 
    1   3   4   5 
    256 320 384 512

But I think this is not true,cause I have to get 5 means but here I get four numbers!




On Monday, May 16, 2016 6:29 AM, Michael Dewey <lists at dewey.myzen.co.uk> wrote:
Dear Elahe

In line

On 16/05/2016 13:31, ch.elahe via R-help wrote:
> Hi all,
> I have a column in my df and I want to get quartiles for this column and then calculate mean for each and every quartile, here is my column:
>

The quartiles are strictly speaking the boundaries but if you really 
meant that the problem is trivial so i assume you want to cut the 
variable at the quartiles.

>
>     df$BR
>     [1] 384 384 384 384 512 384 384 320 320 320 320 320 320 320 320 320 320 384
>     [19] 384 384 320 320 320 320 384 384 256 320 320 320 384 320 320 320 384 384
>     [37] 320 320 320 320 320 320 320 320 320 384 320 320 320 320 320 320 384 320
>     [55] 320 320 320 320 320 320 384 512 320 320 320 320 320 320 320 384 384 320
>     [73] 320 320 384 320 320 320 320 256 320 320 384 320 384 320 384 320 320 320
>     [91] 384 320 320 320 320 320 320 320 320 320 320 320
>
> I do the following to get the quartiles:
>
>
>     quantile(m$BR)
>     0%  25%  50%  75% 100%
>     256  320  320  368  512
>
> now how can I get mean for each quartile?

How about setting up a vector which takes the values 1, 2, 3, 4 
depending on the values of BR with cutpoints defined by 
quantile(BR)(using ifelse) and then using tapply?


> Thnaks for any help,
> Elahe
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Michael
http://www.dewey.myzen.co.uk/home.html

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list