[R] tidyverse: grouped summaries (with summerize)
Avi Gross
@v|gro@@ @end|ng |rom ver|zon@net
Tue Sep 14 00:36:06 CEST 2021
As Eric has pointed out, perhaps Rich is not thinking pipelined. Summarize() takes a first argument as:
summarise(.data=whatever, ...)
But in a pipeline, you OMIT the first argument and let the pipeline supply an argument silently.
What I think summarize saw was something like:
summarize(. , disc_by_month, vol = mean(cfs, na.rm = TRUE))
There is now a superfluous SECOND argument in a place it expected not a data.frame type of variable but the name of a column in the hidden data.frame-like object it was passed. You do not have a column called disc_by_month and presumably some weird logic made it suggest it was replacing that by the first column or something.
I hope this makes sense. You do not cobble a pipeline together from parts without carefully making sure all first arguments otherwise used are NOT used.
And, just FYI, the subject line should not use a word that some see as the opposite companion of "winterize" ...
-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Rich Shepard
Sent: Monday, September 13, 2021 5:51 PM
To: r-help using r-project.org
Subject: Re: [R] tidyverse: grouped summaries (with summerize)
On Mon, 13 Sep 2021, Rich Shepard wrote:
> That's what I thought I did. I'll rewrite the script and work toward
> the output I need.
Still not the correct syntax. Command is now:
disc_by_month %>%
group_by(year, month) %>%
summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE))
and results are:
> source('disc.R')
`summarise()` has grouped output by 'year', 'month'. You can override using the `.groups` argument.
> disc_by_month
# A tibble: 590,940 × 6
# Groups: year, month [66]
year month day hour min cfs
<int> <int> <int> <int> <int> <dbl>
1 2016 3 3 12 0 149000
2 2016 3 3 12 10 150000
3 2016 3 3 12 20 151000
4 2016 3 3 12 30 156000
5 2016 3 3 12 40 154000
6 2016 3 3 12 50 150000
7 2016 3 3 13 0 153000
8 2016 3 3 13 10 156000
9 2016 3 3 13 20 154000
10 2016 3 3 13 30 155000
# … with 590,930 more rows
The grouping is still not right. I expected to see a mean value for each month of each year in the data set, not for each minute.
Rich
______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list