[R] mean for every quartile
David L Carlson
dcarlson at tamu.edu
Mon May 16 16:07:02 CEST 2016
Do you understand that quartiles divide the data into 4 groups?
Min (group 1) 1st quartile (group 2) median (group3) 3rd quartile (group4) max
But in your case df$BR has only 4 unique values:
> table(df$BR)
256 320 384 512
2 74 24 2
So the first quartile is equal to the median:
> quantile(df$BR)
0% 25% 50% 75% 100%
256 320 320 368 512
You need to use the argument rightmost.closed=TRUE with findInterval(). If you do not, the 5th group consists of only those values that are equal to the maximum:
> df$quant <- findInterval(df$BR, quantile(df$BR), rightmost.closed=TRUE)
> tapply(df$BR, df$quant, mean)
1 3 4
256.0000 320.0000 393.8462
Using values that are more variable:
> set.seed(42)
> df <- data.frame(BR=sample.int(100, 100, replace=TRUE))
> df$quant <- findInterval(df$BR, quantile(df$BR), rightmost.closed=TRUE)
> tapply(df$BR, df$quant, mean)
1 2 3 4
12.48 41.24 67.24 90.64
-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of ch.elahe via R-help
Sent: Monday, May 16, 2016 8:46 AM
To: Michael Dewey; ulrik.stervbo at gmail.com
Cc: R-help Mailing List
Subject: Re: [R] mean for every quartile
Thnaks for your reply,
By using tapply I get this result:
tapply(df$BR, findInterval(df$BR, quantile(df$BR)), mean)
1 3 4 5
256 320 384 512
But I think this is not true,cause I have to get 5 means but here I get four numbers!
On Monday, May 16, 2016 6:29 AM, Michael Dewey <lists at dewey.myzen.co.uk> wrote:
Dear Elahe
In line
On 16/05/2016 13:31, ch.elahe via R-help wrote:
> Hi all,
> I have a column in my df and I want to get quartiles for this column and then calculate mean for each and every quartile, here is my column:
>
The quartiles are strictly speaking the boundaries but if you really
meant that the problem is trivial so i assume you want to cut the
variable at the quartiles.
>
> df$BR
> [1] 384 384 384 384 512 384 384 320 320 320 320 320 320 320 320 320 320 384
> [19] 384 384 320 320 320 320 384 384 256 320 320 320 384 320 320 320 384 384
> [37] 320 320 320 320 320 320 320 320 320 384 320 320 320 320 320 320 384 320
> [55] 320 320 320 320 320 320 384 512 320 320 320 320 320 320 320 384 384 320
> [73] 320 320 384 320 320 320 320 256 320 320 384 320 384 320 384 320 320 320
> [91] 384 320 320 320 320 320 320 320 320 320 320 320
>
> I do the following to get the quartiles:
>
>
> quantile(m$BR)
> 0% 25% 50% 75% 100%
> 256 320 320 368 512
>
> now how can I get mean for each quartile?
How about setting up a vector which takes the values 1, 2, 3, 4
depending on the values of BR with cutpoints defined by
quantile(BR)(using ifelse) and then using tapply?
> Thnaks for any help,
> Elahe
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Michael
http://www.dewey.myzen.co.uk/home.html
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list