[R] How to group by and get distinct rows of of grouped rows based on certain criteria

William Dunlap wdunlap at tibco.com
Thu Jul 14 22:53:51 CEST 2016


> txt <- "|ATP Group|Business Event|Category|
|02       |A             |AC      |
|02       |A             |AD      |
|02       |A             |EQ      |
|ZM       |A             |AU      |
|ZM       |A             |AV      |
|ZM       |A             |AW      |
|02       |B             |AC      |
|02       |B             |AY      |
|02       |B             |EQ      |
"
> d <- read.table(sep="|", text=txt, header=TRUE, strip.white=TRUE,
check.names=FALSE)[,2:4]
> str(d)
'data.frame':   9 obs. of  3 variables:
 $ ATP Group     : Factor w/ 2 levels "02","ZM": 1 1 1 2 2 2 1 1 1
 $ Business Event: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2
 $ Category      : Factor w/ 7 levels "AC","AD","AU",..: 1 2 7 3 4 5 1 6 7
> unique(d[d[,"Category"]!="EQ", c("ATP Group", "Business Event")])
  ATP Group Business Event
1        02              A
4        ZM              A
7        02              B
> unique(d[d[,"Category"]=="EQ", c("ATP Group", "Business Event")])
  ATP Group Business Event
3        02              A
9        02              B

Some folks prefer to use subset() instead of "[".  The previous expression
is equivalent to:

> unique( subset(d, Category=="EQ", c("ATP Group", "Business Event")))
  ATP Group Business Event
3        02              A
9        02              B


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Jul 14, 2016 at 12:43 PM, Satish Vadlamani <
satish.vadlamani at gmail.com> wrote:

> Hello All:
> I would like to get your help on the following problem.
>
> I have the following data and the first row is the header. Spaces are not
> important.
> I want to find out distinct combinations of ATP Group and Business Event
> (these are the field names that you can see in the data below) that have
> the Category EQ (Category is the third field) and those that do not have
> the category EQ. In the example below, the combinations 02/A and 02/B have
> EQ and the combination ZM/A does not.
>
> If I have a larger file, how to get to this answer?
>
> What did I try (with dplyr)?
>
> # I know that the below is not correct and not giving desired results
> file1_1 <- file1  %>% group_by(ATP.Group,Business.Event) %>%
> filter(Category != "EQ") %>% distinct(ATP.Group,Business.Event)
> # for some reason, I have to convert to data.frame to print the data
> correctly
> file1_1 <- as.data.frame(file1_1)
> file1_1
>
>
> *Data shown below*
> |ATP Group|Business Event|Category|
> |02       |A             |AC      |
> |02       |A             |AD      |
> |02       |A             |EQ      |
> |ZM       |A             |AU      |
> |ZM       |A             |AV      |
> |ZM       |A             |AW      |
> |02       |B             |AC      |
> |02       |B             |AY      |
> |02       |B             |EQ      |
>
> --
>
> Satish Vadlamani
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list