[R] by group
Avi Gross
@v|gro@@ @end|ng |rom ver|zon@net
Mon Nov 1 23:44:25 CET 2021
This is a fairly simple request and well covered by introductory reading
material.
A decent example was given and I see Andrew provided a base R reply that
should be sufficient. But I do not think he realized you wanted something
different so his answer is not in the format you wanted:
> tapply(dat$wt, dat$Year, mean) # mean by Year
2001 2002 2003
13.50000 14.83333 13.50000
> tapply(dat$wt, dat$Sex , mean) # mean by Sex tapply(dat$wt,
list(dat$Year, dat$Sex), mean) # mean by Year and Sex
F M
12.44444 15.44444
I personally often prefer to the tidyverse approach which optionally
includes pipes and allows a data frame to be grouped any way you want and
followed by commands. It is easier to output your result this way by
grouping BOTH by Year and Sex at once and getting multiple lines of output.
Note the code below requires a line once like install.packages("tidyverse)
library(tidyverse)
dat <- read.table(
text = "Year Sex wt
2001 M 15
2001 M 14
2001 M 16
2001 F 12
2001 F 11
2001 F 13
2002 M 14
2002 M 18
2002 M 17
2002 F 11
2002 F 15
2002 F 14
2003 M 18
2003 M 13
2003 M 14
2003 F 15
2003 F 10
2003 F 11 ",
header = TRUE
)
dat %>%
group_by(Year, Sex) %>%
summarize( M = mean(wt, na.rm=TRUE))
The output of the above is the rows below:
> dat %>%
+ group_by(Year, Sex) %>%
+ summarize( M = mean(wt, na.rm=TRUE))
`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.
# A tibble: 6 x 3
# Groups: Year [3]
Year Sex M
<int> <chr> <dbl>
1 2001 F 12
2 2001 M 15
3 2002 F 13.3
4 2002 M 16.3
5 2003 F 12
6 2003 M 15
Note Male and Female have their own rows. It is not that hard to switch it
to your format by rearranging the intermediate data set with pivot_wider()
in the pipeline asking to make multiple new columns from variable Sex and
populating them from the created variable M. The new complete pipeline is
now:
dat %>%
group_by(Year, Sex) %>%
summarize( M = mean(wt, na.rm=TRUE)) %>%
pivot_wider(names_from = Sex, values_from = M)
The output as a tibble is:
Year F M
<int> <dbl> <dbl>
1 2001 12 15
2 2002 13.3 16.3
3 2003 12 15
Or as a data.frame which seems to add zeroes:
dat %>%
+ group_by(Year, Sex) %>%
+ summarize( M = mean(wt, na.rm=TRUE)) %>%
+ pivot_wider(names_from = Sex, values_from = M) %>%
+ as.data.frame
`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.
Year F M
1 2001 12.00000 15.00000
2 2002 13.33333 16.33333
3 2003 12.00000 15.00000
Your expected output is too rounded as it expects 13.3 and 16.3 but if you
insist on a single significant digit after the decimal point, ask for it to
be rounded:
> dat %>%
+ group_by(Year, Sex) %>%
+ summarize( M = mean(wt, na.rm=TRUE)) %>%
+ pivot_wider(names_from = Sex, values_from = M) %>%
+ as.data.frame %>%
+ round(1)
`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.
Year F M
1 2001 12.0 15.0
2 2002 13.3 16.3
3 2003 12.0 15.0
And, yes, any of the above can be done in various ways using plain old R,
and especially in the recent versions that have added a somewhat different
way to do pipelines.
-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Val
Sent: Monday, November 1, 2021 5:08 PM
To: r-help using R-project.org (r-help using r-project.org) <r-help using r-project.org>
Subject: [R] by group
Hi All,
How can I generate mean by group. The sample data looks like as follow,
dat<-read.table(text="Year Sex wt
2001 M 15
2001 M 14
2001 M 16
2001 F 12
2001 F 11
2001 F 13
2002 M 14
2002 M 18
2002 M 17
2002 F 11
2002 F 15
2002 F 14
2003 M 18
2003 M 13
2003 M 14
2003 F 15
2003 F 10
2003 F 11 ",header=TRUE)
The desired output is,
M F
2001 15 12
2002 16.33 13.33
2003 15 12
Thank you,
______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list