[R] how to filter variables which appear in any row but do not include
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Wed Jun 3 21:25:09 CEST 2020
Hello,
I forgot about %in%. Maybe because in the OP there were regex's.
And rowSums is much faster than apply.
In my tests this is 7 times faster than mine but with
%in% instead of grepl and apply(no, 1, any)
Hope this helps,
Rui Barradas
Às 18:34 de 03/06/20, Bert Gunter escreveu:
> regex's are not needed. Using Rui's example:
>
> > bad <- mapply(function(x) x %in% unwanted,dat)
> > dat[!rowSums(bad),]
>
> V1 V2 V3 V4 V5
> 2 E117 E113 E119 E100 E10
> 4 E114 E11 E119 E119 E114
> 5 E109 E111 E103 E103 E100
> 7 E108 E113 E119 E117 E11
> 8 E114 E105 E10 E109 E110
> 9 E119 E116 E108 E118 E119
> 10 E100 E110 E104 E111 E101
> 13 E111 E116 E101 E110 E116
> 15 E103 E11 E108 E10 E113
> 16 E111 E117 E103 E115 E119
> 17 E104 E110 E104 E117 E114
> 19 E100 E108 E10 E111 E105
> 20 E109 E115 E117 E108 E106
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Wed, Jun 3, 2020 at 9:57 AM Rui Barradas <ruipbarradas using sapo.pt
> <mailto:ruipbarradas using sapo.pt>> wrote:
>
> Hello,
>
> If you want to filter out rows with any of the values in a 'unwanted'
> vector, try the following.
>
> First, create a test data set.
>
> x <- scan(what = character(), text = '
> "E10" "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102"
> "E107" "E11" "E119" "E113" "E115" "E111" "E114" "E110" "E118"
> "E116" "E112"
> "E117"
> ')
>
> set.seed(2020)
> dat <- replicate(5, sample(x, 20, TRUE))
> dat <- as.data.frame(dat)
>
>
> Now, remove all rows that have at least one of "E102" or "E112"
>
>
> unwanted <- c("E102", "E112")
> no <- sapply(dat, function(x){
> grepl(paste(unwanted, collapse = "|"), x)
> })
> no <- apply(no, 1, any)
> dat[!no, ]
>
>
> That's it, if I understood the problem.
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 15:55 de 03/06/20, Ana Marija escreveu:
> > Hello.
> >
> > I am trying to filter only rows that have ANY of these variables:
> > E109, E119, E149
> >
> > so I did:
> > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149")))
> >
> > than I checked what I got:
> >> s0 <- sapply(controls, function(x) grep('^E10', x, value = TRUE))
> >> d0=unlist(s0)
> >> d10=unique(d0)
> >> d10
> > [1] "E10" "E103" "E104" "E109" "E101" "E108" "E105" "E100"
> "E106" "E102"
> > [11] "E107"
> > s1 <- sapply(controls, function(x) grep('^E11', x, value = TRUE))
> > d1=unlist(s1)
> > d11=unique(d1)
> >> d11
> > [1] "E11" "E119" "E113" "E115" "E111" "E114" "E110" "E118"
> "E116" "E112"
> > [11] "E117"
> >
> > I need help with changing this command
> > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149")))
> >
> > so that in the output I do not have any rows that include E102 or
> E112?
> >
> > Thanks
> > Ana
> >
> > ______________________________________________
> > R-help using r-project.org <mailto:R-help using r-project.org> mailing list
> -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help using r-project.org <mailto:R-help using r-project.org> mailing list --
> To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list