[R] Antwort: Re: dplyr : row total for all groups in dplyr summarise
David Winsemius
dwinsemius at comcast.net
Tue Jul 5 18:47:29 CEST 2016
> On Jul 5, 2016, at 2:27 AM, G.Maubach at weinwolf.de wrote:
>
> Hi guys,
>
> I checked out your example but I can't follow the results.:
>
>> mtcars %>%
> + group_by (am, gear) %>%
> + summarise (n=n()) %>%
> + mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>%
> + ungroup() %>%
> + mutate(row.tot = sum(n))
> Source: local data frame [4 x 5]
>
> am gear n rel.freq row.tot
> (dbl) (dbl) (int) (chr) (int)
> 1 0 3 15 79% 32
> 2 0 4 4 21% 32
> 3 1 4 8 62% 32
> 4 1 5 5 38% 32
>
> We have a total of 32 cases and 15 * 100 / 32 = 48,9 % instead of 79 %.
> The same with the other columns. How is 79 % calculated?
>
It is apparently the number of items in the first “group determinant”
> mtcars %>%
+ group_by (am, gear) %>%
+ summarise (n=n()) %>%
+ mutate(sum = sum(n)) %>%
+ ungroup()
Source: local data frame [4 x 4]
am gear n sum
(dbl) (dbl) (int) (int)
1 0 3 15 19
2 0 4 4 19
3 1 4 8 13
4 1 5 5 13
> ?n
> with(mtcars,table(am,gear))
gear
am 3 4 5
0 15 4 0
1 0 8 5
The documentation for the `n` functions is particularly unhelpful in letting one know what to expect from it:
"Description
This function is implemented special for each data source and can only be used from within summarise, mutate and filter"
—
David.
> When searching the web I saw this example:
>
> -- cut --
>
> #-- not run --
> url <- "http://www.lock5stat.com/datasets/HollywoodMovies2011.csv"
> response <- GET(url)
> Hollywoodmovies2011 <- content(x = GET(url), as = data.frame)
> #-- end not run
>
> Hollywoodmovies2011 %>%
> group_by(genre) %>%
> summarize(count = n()) %>%
> mutate(rf = count / sum(count))
>
> -- cut --
>
> which gives
>
> Source: local data frame [9 x 3]
>
> Genre count %
> (fctr) (int) (dbl)
> 1 Action 32 0.235294118
> 2 Adventure 1 0.007352941
> 3 Animation 12 0.088235294
> 4 Comedy 27 0.198529412
> 5 Drama 21 0.154411765
> 6 Fantasy 2 0.014705882
> 7 Horror 17 0.125000000
> 8 Romance 11 0.080882353
> 9 Thriller 13 0.095588235
>
> Here the % correspond to the count and the sum of count, e. g. sum = 136
> and 32 / 136 = 0,2352941.
>
> What is the difference when counting? What do the relative counts in the
> first example mean?
>
> Kind regards
>
> Georg
>
>
>
>
>
> Von: Ulrik Stervbo <ulrik.stervbo at gmail.com>
> An: David Winsemius <dwinsemius at comcast.net>,
> Kopie: r-help at r-project.org, maicel at infomed.sld.cu
> Datum: 05.07.2016 06:06
> Betreff: Re: [R] dplyr : row total for all groups in dplyr
> summarise
> Gesendet von: "R-help" <r-help-bounces at r-project.org>
>
>
>
> That will give you the wrong result when used on summarised data
>
> David Winsemius <dwinsemius at comcast.net> schrieb am Di., 5. Juli 2016
> 02:10:
>
>> I thought there was an nrow() function?
>>
>> Sent from my iPhone
>>
>> On Jul 4, 2016, at 9:59 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com>
> wrote:
>>
>> If you want the total number of rows in the original data.frame after
>> counting the rows in each group, you can ungroup and sum the row counts,
>> like:
>>
>> library("dplyr")
>>
>>
>> mtcars %>%
>> group_by (am, gear) %>%
>> summarise (n=n()) %>%
>> mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>%
>> ungroup() %>%
>> mutate(row.tot = sum(n))
>>
>> HTH
>> Ulrik
>>
>> On Mon, 4 Jul 2016 at 18:23 David Winsemius <dwinsemius at comcast.net>
>> wrote:
>>
>>>
>>>> On Jul 4, 2016, at 6:56 AM, maicel at infomed.sld.cu wrote:
>>>>
>>>> Hello,
>>>> How can I aggregate row total for all groups in dplyr summarise ?
>>>
>>> Row total … of what? Aggregate … how? What is the desired answer?
>>>
>>>
>>>
>>>> library(dplyr)
>>>> mtcars %>%
>>>> group_by (am, gear) %>%
>>>> summarise (n=n()) %>%
>>>> mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%"))
>>>>
>>>> best regard
>>>> Maicel Monzon
>>>>
>>>>
>>>>
>>>> ----------------------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Este mensaje le ha llegado mediante el servicio de correo electronico
>>> que ofrece Infomed para respaldar el cumplimiento de las misiones del
>>> Sistema Nacional de Salud. La persona que envia este correo asume el
>>> compromiso de usar el servicio a tales fines y cumplir con las
> regulaciones
>>> establecidas
>>>>
>>>> Infomed: http://www.sld.cu/
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list