[R] How to group by and get distinct rows of of grouped rows based on certain criteria

Fri Jul 15 22:43:16 CEST 2016

Thank you Bill and Sarah for your help. I was able to do the same with
dplyr with the following code. But I could not post this since at that time
my message was not posted yet.

>>
file1 <- select(file1, ATP.Group,Business.Event,Category)

file1_1 <- file1  %>% group_by(ATP.Group,Business.Event) %>%
filter(Category == "EQ") %>% distinct(ATP.Group,Business.Event)
file1_1 <- as.data.frame(file1_1)
file1_1

file1_2 <- file1  %>% group_by(ATP.Group,Business.Event) %>%
distinct(ATP.Group,Business.Event)
file1_2 <- as.data.frame(file1_2)
file1_2

setdiff(select(file1_2,ATP.Group,Business.Event),
select(file1_1,ATP.Group,Business.Event))
>>

On Thu, Jul 14, 2016 at 1:53 PM, William Dunlap <wdunlap at tibco.com> wrote:

> > txt <- "|ATP Group|Business Event|Category|
> |02       |A             |AC      |
> |02       |A             |AD      |
> |02       |A             |EQ      |
> |ZM       |A             |AU      |
> |ZM       |A             |AV      |
> |ZM       |A             |AW      |
> |02       |B             |AC      |
> |02       |B             |AY      |
> |02       |B             |EQ      |
> "
> > d <- read.table(sep="|", text=txt, header=TRUE, strip.white=TRUE,
> check.names=FALSE)[,2:4]
> > str(d)
> 'data.frame':   9 obs. of  3 variables:
>  $ ATP Group     : Factor w/ 2 levels "02","ZM": 1 1 1 2 2 2 1 1 1
>  $ Business Event: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2
>  $ Category      : Factor w/ 7 levels "AC","AD","AU",..: 1 2 7 3 4 5 1 6 7
> > unique(d[d[,"Category"]!="EQ", c("ATP Group", "Business Event")])
>   ATP Group Business Event
> 1        02              A
> 4        ZM              A
> 7        02              B
> > unique(d[d[,"Category"]=="EQ", c("ATP Group", "Business Event")])
>   ATP Group Business Event
> 3        02              A
> 9        02              B
>
> Some folks prefer to use subset() instead of "[".  The previous expression
> is equivalent to:
>
> > unique( subset(d, Category=="EQ", c("ATP Group", "Business Event")))
>   ATP Group Business Event
> 3        02              A
> 9        02              B
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Thu, Jul 14, 2016 at 12:43 PM, Satish Vadlamani <
> satish.vadlamani at gmail.com> wrote:
>
>> Hello All:
>> I would like to get your help on the following problem.
>>
>> I have the following data and the first row is the header. Spaces are not
>> important.
>> I want to find out distinct combinations of ATP Group and Business Event
>> (these are the field names that you can see in the data below) that have
>> the Category EQ (Category is the third field) and those that do not have
>> the category EQ. In the example below, the combinations 02/A and 02/B have
>> EQ and the combination ZM/A does not.
>>
>> If I have a larger file, how to get to this answer?
>>
>> What did I try (with dplyr)?
>>
>> # I know that the below is not correct and not giving desired results
>> file1_1 <- file1  %>% group_by(ATP.Group,Business.Event) %>%
>> filter(Category != "EQ") %>% distinct(ATP.Group,Business.Event)
>> # for some reason, I have to convert to data.frame to print the data
>> correctly
>> file1_1 <- as.data.frame(file1_1)
>> file1_1
>>
>>
>> *Data shown below*
>> |ATP Group|Business Event|Category|
>> |02       |A             |AC      |
>> |02       |A             |AD      |
>> |02       |A             |EQ      |
>> |ZM       |A             |AU      |
>> |ZM       |A             |AV      |
>> |ZM       |A             |AW      |
>> |02       |B             |AC      |
>> |02       |B             |AY      |
>> |02       |B             |EQ      |
>>
>> --
>>
>> Satish Vadlamani
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

-- 

Satish Vadlamani

	[[alternative HTML version deleted]]