[R] How to group by and get distinct rows of of grouped rows based on certain criteria
Satish Vadlamani
satish.vadlamani at gmail.com
Fri Jul 15 22:43:16 CEST 2016
Thank you Bill and Sarah for your help. I was able to do the same with
dplyr with the following code. But I could not post this since at that time
my message was not posted yet.
>>
file1 <- select(file1, ATP.Group,Business.Event,Category)
file1_1 <- file1 %>% group_by(ATP.Group,Business.Event) %>%
filter(Category == "EQ") %>% distinct(ATP.Group,Business.Event)
file1_1 <- as.data.frame(file1_1)
file1_1
file1_2 <- file1 %>% group_by(ATP.Group,Business.Event) %>%
distinct(ATP.Group,Business.Event)
file1_2 <- as.data.frame(file1_2)
file1_2
setdiff(select(file1_2,ATP.Group,Business.Event),
select(file1_1,ATP.Group,Business.Event))
>>
On Thu, Jul 14, 2016 at 1:53 PM, William Dunlap <wdunlap at tibco.com> wrote:
> > txt <- "|ATP Group|Business Event|Category|
> |02 |A |AC |
> |02 |A |AD |
> |02 |A |EQ |
> |ZM |A |AU |
> |ZM |A |AV |
> |ZM |A |AW |
> |02 |B |AC |
> |02 |B |AY |
> |02 |B |EQ |
> "
> > d <- read.table(sep="|", text=txt, header=TRUE, strip.white=TRUE,
> check.names=FALSE)[,2:4]
> > str(d)
> 'data.frame': 9 obs. of 3 variables:
> $ ATP Group : Factor w/ 2 levels "02","ZM": 1 1 1 2 2 2 1 1 1
> $ Business Event: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2
> $ Category : Factor w/ 7 levels "AC","AD","AU",..: 1 2 7 3 4 5 1 6 7
> > unique(d[d[,"Category"]!="EQ", c("ATP Group", "Business Event")])
> ATP Group Business Event
> 1 02 A
> 4 ZM A
> 7 02 B
> > unique(d[d[,"Category"]=="EQ", c("ATP Group", "Business Event")])
> ATP Group Business Event
> 3 02 A
> 9 02 B
>
> Some folks prefer to use subset() instead of "[". The previous expression
> is equivalent to:
>
> > unique( subset(d, Category=="EQ", c("ATP Group", "Business Event")))
> ATP Group Business Event
> 3 02 A
> 9 02 B
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Thu, Jul 14, 2016 at 12:43 PM, Satish Vadlamani <
> satish.vadlamani at gmail.com> wrote:
>
>> Hello All:
>> I would like to get your help on the following problem.
>>
>> I have the following data and the first row is the header. Spaces are not
>> important.
>> I want to find out distinct combinations of ATP Group and Business Event
>> (these are the field names that you can see in the data below) that have
>> the Category EQ (Category is the third field) and those that do not have
>> the category EQ. In the example below, the combinations 02/A and 02/B have
>> EQ and the combination ZM/A does not.
>>
>> If I have a larger file, how to get to this answer?
>>
>> What did I try (with dplyr)?
>>
>> # I know that the below is not correct and not giving desired results
>> file1_1 <- file1 %>% group_by(ATP.Group,Business.Event) %>%
>> filter(Category != "EQ") %>% distinct(ATP.Group,Business.Event)
>> # for some reason, I have to convert to data.frame to print the data
>> correctly
>> file1_1 <- as.data.frame(file1_1)
>> file1_1
>>
>>
>> *Data shown below*
>> |ATP Group|Business Event|Category|
>> |02 |A |AC |
>> |02 |A |AD |
>> |02 |A |EQ |
>> |ZM |A |AU |
>> |ZM |A |AV |
>> |ZM |A |AW |
>> |02 |B |AC |
>> |02 |B |AY |
>> |02 |B |EQ |
>>
>> --
>>
>> Satish Vadlamani
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
--
Satish Vadlamani
[[alternative HTML version deleted]]
More information about the R-help
mailing list