[R] dplyr - counting a number of specific values in each column - for all columns at once

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Tue Jun 16 20:11:40 CEST 2015


Thank you, Clint.
That's the thing: it's relatively easy to do it in base, but the
resulting code is not THAT simple.
I thought dplyr would make it easy...

On Tue, Jun 16, 2015 at 2:06 PM, Clint Bowman <clint at ecy.wa.gov> wrote:
> May want to add headers but the following provides the device number with
> each set fo sums:
>
> for (dev in (unique(md$device)))
> {cat(colSums(subset(md,md$device==dev)==5,na.rm=T),dev,"\n")}
>
> Clint Bowman                    INTERNET:       clint at ecy.wa.gov
> Air Quality Modeler             INTERNET:       clint at math.utah.edu
> Department of Ecology           VOICE:          (360) 407-6815
> PO Box 47600                    FAX:            (360) 407-7534
> Olympia, WA 98504-7600
>
>         USPS:           PO Box 47600, Olympia, WA 98504-7600
>         Parcels:        300 Desmond Drive, Lacey, WA 98503-1274
>
> On Tue, 16 Jun 2015, Dimitri Liakhovitski wrote:
>
>> Except, of course, Bert, that you forgot that it had to be done by
>> device. Your solution ignores the device.
>>
>> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c =
>> c(1,3,4,3,5,5),
>>      device = c(1,1,2,2,3,3))
>> myvars = c("a", "b", "c")
>> md[2,3] <- NA
>> md[4,1] <- NA
>> md
>> vapply(md[myvars], function(x) sum(x==5,na.rm=TRUE),1L)
>>
>> But the result should be by device.
>>
>> On Tue, Jun 16, 2015 at 1:56 PM, Dimitri Liakhovitski
>> <dimitri.liakhovitski at gmail.com> wrote:
>>>
>>> Thank you, Bert.
>>> I'll be honest - I am just learning dplyr and was wondering if one
>>> could do it in dplyr.
>>> But of course your solution is perfect...
>>>
>>> On Tue, Jun 16, 2015 at 1:50 PM, Bert Gunter <bgunter.4567 at gmail.com>
>>> wrote:
>>>>
>>>> Well, dplyr seems a bit of overkill as it's so simple with plain old
>>>> vapply() in base R :
>>>>
>>>>
>>>>> dat <- data.frame (a=sample(1:5,10,rep=TRUE),
>>>>
>>>> +                    b=sample(3:7,10,rep=TRUE),
>>>> +                    g = sample(7:9,10,rep=TRUE))
>>>>
>>>>> vapply(dat,function(x)sum(x==5,na.rm=TRUE),1L)
>>>>
>>>>
>>>> a b g
>>>> 5 4 0
>>>>
>>>>
>>>>
>>>> Cheers,
>>>> Bert
>>>>
>>>> Bert Gunter
>>>>
>>>> "Data is not information. Information is not knowledge. And knowledge is
>>>> certainly not wisdom."
>>>>    -- Clifford Stoll
>>>>
>>>> On Tue, Jun 16, 2015 at 10:24 AM, Dimitri Liakhovitski
>>>> <dimitri.liakhovitski at gmail.com> wrote:
>>>>>
>>>>>
>>>>> Hello!
>>>>>
>>>>> I have a data frame:
>>>>>
>>>>> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c =
>>>>> c(1,3,4,3,5,5),
>>>>>       device = c(1,1,2,2,3,3))
>>>>> myvars = c("a", "b", "c")
>>>>> md[2,3] <- NA
>>>>> md[4,1] <- NA
>>>>> md
>>>>>
>>>>> I want to count number of 5s in each column - by device. I can do it
>>>>> like
>>>>> this:
>>>>>
>>>>> library(dplyr)
>>>>> group_by(md, device) %>%
>>>>> summarise(counts.a = sum(a==5, na.rm = T),
>>>>>           counts.b = sum(b==5, na.rm = T),
>>>>>           counts.c = sum(c==5, na.rm = T))
>>>>>
>>>>> However, in real life I'll have tons of variables (the length of
>>>>> 'myvars' can be very large) - so that I can't specify those counts.a,
>>>>> counts.b, etc. manually - dozens of times.
>>>>>
>>>>> Does dplyr allow to run the count of 5s on all 'myvars' columns at
>>>>> once?
>>>>>
>>>>>
>>>>> --
>>>>> Dimitri Liakhovitski
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Dimitri Liakhovitski
>>
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



-- 
Dimitri Liakhovitski



More information about the R-help mailing list