[R] Removing variables from data frame with a wile card
@vi@e@gross m@iii@g oii gm@ii@com
@vi@e@gross m@iii@g oii gm@ii@com
Sun Jan 15 02:12:39 CET 2023
John,
I am very familiar with the evolving tidyverse and some messages a while back included people who wanted this forum to mainly stick to base R, so I leave out examples.
Indeed, the tidyverse is designed to make it easy to select columns with all kinds of conditions including using regular expressions that allow more precision (as does grep) so you want to match “yr” followed by exactly one or two digits. Some of the answers suggest starting with “yr” was enough. They also allow selecting on arbitrary considerations like whether the column contains numeric data. You can do most things in base R, albeit I find the tidyverse method easier most of the time and also able to do some extremely complicated things with some care, such as creating multiple new columns form a set of columns that each implement a different function like mean, and mode and standard deviation and make the new columns the same names as the one they are derived from but a different suffix reflecting what transformation was done.
One nice feature is the ideas behind how data streams through multiple steps with one or a few transformations in each step, and the intermediate parts you do not want, simply melt away. The part about selecting or deselecting columns can often be used in many of the verbs.
From: John Kane <jrkrideau using gmail.com>
Sent: Saturday, January 14, 2023 4:07 PM
To: avi.e.gross using gmail.com
Cc: R-help Mailing List <r-help using r-project.org>
Subject: Re: [R] Removing variables from data frame with a wile card
You rang sir?
library(tidyverse)
xx = 1:10
yr1 = yr2 = yr3 = rnorm(10)
dat1 <- data.frame(xx , yr1, yr2, y3)
dat1 %>% select(!starts_with("yr"))
or for something a bit more exotic as I have been trying to learn a bit about the "data.table package
library(data.table)
xx = 1:10
yr1 = yr2 = yr3 = rnorm(10)
dat2 <- data.table(xx , yr1, yr2, yr3)
dat2[, !names(dat2) %like% "yr", with=FALSE ]
On Sat, 14 Jan 2023 at 12:28, <avi.e.gross using gmail.com <mailto:avi.e.gross using gmail.com> > wrote:
Steven,
Just want to add a few things to what people wrote.
In base R, the methods mentioned will let you make a copy of your original DF that is missing the items you are selecting that match your pattern.
That is fine.
For some purposes, you want to keep the original data.frame and remove a column within it. You can do that in several ways but the simplest is something where you sat the column to NULL as in:
mydata$NAME <- NULL
using the mydata["NAME"] notation can do that for you by using a loop of unctional programming method that does that with all components of your grep.
R does have optimizations that make this less useful as a partial copy of a data.frame retains common parts till things change.
For those who like to use the tidyverse, it comes with lots of tools that let you select columns that start with or end with or contain some pattern and I find that way easier.
-----Original Message-----
From: R-help <r-help-bounces using r-project.org <mailto:r-help-bounces using r-project.org> > On Behalf Of Steven Yen
Sent: Saturday, January 14, 2023 7:49 AM
To: Andrew Simmons <akwsimmo using gmail.com <mailto:akwsimmo using gmail.com> >
Cc: R-help Mailing List <r-help using r-project.org <mailto:r-help using r-project.org> >
Subject: Re: [R] Removing variables from data frame with a wile card
Thanks to all. Very helpful.
Steven from iPhone
> On Jan 14, 2023, at 3:08 PM, Andrew Simmons <akwsimmo using gmail.com <mailto:akwsimmo using gmail.com> > wrote:
>
> You'll want to use grep() or grepl(). By default, grep() uses
> extended regular expressions to find matches, but you can also use
> perl regular expressions and globbing (after converting to a regular expression).
> For example:
>
> grepl("^yr", colnames(mydata))
>
> will tell you which 'colnames' start with "yr". If you'd rather you
> use globbing:
>
> grepl(glob2rx("yr*"), colnames(mydata))
>
> Then you might write something like this to remove the columns starting with yr:
>
> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>
>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen <styen using ntu.edu.tw <mailto:styen using ntu.edu.tw> > wrote:
>>
>> I have a data frame containing variables "yr3",...,"yr28".
>>
>> How do I remove them with a wild card----something similar to "del yr*"
>> in Windows/doc? Thank you.
>>
>>> colnames(mydata)
>> [1] "year" "weight" "confeduc" "confothr" "college"
>> [6] ...
>> [41] "yr3" "yr4" "yr5" "yr6" "yr7"
>> [46] "yr8" "yr9" "yr10" "yr11" "yr12"
>> [51] "yr13" "yr14" "yr15" "yr16" "yr17"
>> [56] "yr18" "yr19" "yr20" "yr21" "yr22"
>> [61] "yr23" "yr24" "yr25" "yr26" "yr27"
>> [66] "yr28"...
>>
>> ______________________________________________
>> R-help using r-project.org <mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
R-help using r-project.org <mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help using r-project.org <mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
John Kane
Kingston ON Canada
[[alternative HTML version deleted]]
More information about the R-help
mailing list