[R] combine filter() and select()
Martin Morgan
mtmorg@n@b|oc @end|ng |rom gm@||@com
Thu Aug 20 12:46:33 CEST 2020
A kind of hybrid answer is to use base::subset(), which supports non-standard evaluation (it searches for unquoted symbols like 'files' in the code line below in the object that is its first argument; %>% puts 'mytbl' in that first position) and row (filter) and column (select) subsets
> mytbl %>% subset(files %in% "a", files)
# A tibble: 1 x 1
files
<chr>
1 a
Or subset(grepl("a", files), files) if that was what you meant.
One important idea that the tidyverse implements is, in my opinion, 'endomorphism' -- you get back the same type of object as you put in -- so I wouldn't use a base R idiom that returned a vector unless that were somehow essential for the next step in the analysis.
There is value in having separate functions for filter() and select(), and probably there are edge cases where filter(), select(), and subset() behave differently, but for what it's worth subset() can be used to perform these operations individually
> mytbl %>% subset(, files)
# A tibble: 6 x 1
files
<chr>
1 a
2 b
3 c
4 d
5 e
6 f
> mytbl %>% subset(grepl("a", files), )
# A tibble: 1 x 2
files prop
<chr> <int>
1 a 1
Martin Morgan
On 8/20/20, 2:48 AM, "R-help on behalf of Ivan Calandra" <r-help-bounces using r-project.org on behalf of calandra using rgzm.de> wrote:
Hi Jeff,
The code you show is exactly what I usually do, in base R; but I wanted
to play with tidyverse to learn it (and also understand when it makes
sense and when it doesn't).
And yes, of course, in the example I gave, I end up with a 1-cell
tibble, which could be better extracted as a length-1 vector. But my
real goal is not to end up with a single value or even a single column.
I just thought that simplifying my example was the best approach to ask
for advice.
But thank you for letting me know that what I'm doing is pointless!
Ivan
--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra
On 19/08/2020 19:27, Jeff Newmiller wrote:
> The whole point of dplyr primitives is to support data frames... that is, lists of columns. When you pare your data frame down to one column you are almost certainly using the wrong tool for the job.
>
> So, sure, your code works... and it even does what you wanted in the dplyr style, but what a pointless exercise.
>
> grep( "a", mytbl$file, value=TRUE )
>
> On August 19, 2020 7:56:32 AM PDT, Ivan Calandra <calandra using rgzm.de> wrote:
>> Dear useRs,
>>
>> I'm new to the tidyverse world and I need some help on basic things.
>>
>> I have the following tibble:
>> mytbl <- structure(list(files = c("a", "b", "c", "d", "e", "f"), prop =
>> 1:6), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
>>
>> I want to subset the rows with "a" in the column "files", and keep only
>> that column.
>>
>> So I did:
>> myfile <- mytbl %>%
>> filter(grepl("a", files)) %>%
>> select(files)
>>
>> It works, but I believe there must be an easier way to combine filter()
>> and select(), right?
>>
>> Thank you!
>> Ivan
______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list