[R] how to use AND in grepl

Tom Wright tom at maladmin.com
Mon May 2 21:08:20 CEST 2016


Please try to read my earlier comments.
In the absence of a proper example with expected output I think what you
are trying to achieve is:

# create a sample dataframe
df <- data.frame(Command=c("_localize_PD", "_localize_tre_t2",
"_abdomen_t1_seq", "knee_pd_t1_localize", "pd_local_abdomen_t2"))

# identify which rows in the dataframe set match the patterns
# note, the vectors PD, T2 and PDT2 are booleans indicating if a match was
made
PD <- grepl("pd", df$Command)
T2 <- grepl('t2', df$Command)
PDT2 <- grepl("(.*t2.*pd.*)|(.*pd.*t2.*)", df$Command)

# create the new column to hold the new names
df$new_name <- NA

df[PD,'new_name'] <- 'pd'
df[T2,'new_name'] <- 't2'
df[PDT2,'new_name'] <- 'pdt2'


# note 1: the order of these command is important, if the last command is
run first all matches will be overwritten by the single matches for 't2'
and 'pd'.
# note 2: There is no match for row 1 as "PD" != "pd", as suggested by John
McKown the ignore.case parameter for grepl can be used to change this
behaviour.

On Mon, May 2, 2016 at 11:01 AM, <chalabi.elahe at yahoo.de> wrote:

> I just changed all the names in Command to lowercase, then this
> str_extract works fine for "pd" and "t2", but not for "PDT2". Do you have
> any idea how I can bring PDT2  also in str_extract?
>
>
> On Monday, May 2, 2016 9:16 AM, Tom Wright <tom at maladmin.com> wrote:
>
>
>
> The first thing I notice here is that your first two subset statements are
> searching in an object named Command, not the column df$Command. I'm not at
> all sure what you are trying to achieve with the str_extract process but it
> is looking for the exact string 'PDT2' the vectors / dataframe formed in
> your previous commands are not being used at all.
> Moving forward I think you need to pay attention to case "PD" != "pd".
> Also the set PDT2 is going to be a subset of both  sets PD and t2, I don't
> think this is what you are after.
>
> On Mon, May 2, 2016, 8:49 AM  <chalabi.elahe at yahoo.de> wrote:
>
> Yes it works, but let me explain what I am going to do. I extract all the
> names I want and then create a new column out of them for my plot. This is
> he whole thing I do:
> >  PD=subset(df,grepl("pd",Command)) //extract names in Command with only
> "pd"
> >  t2=subset(df,grepl("t2",Command)) //extract names with only "t2"
> >  PDT2=subset(df,grepl("(.*t2.*pd.*)|(.*pd.*t2.*)",df$Command) // extract
> names which contain both "pd" and "t2"
> >  v1=c('PD','t2','PDT2')// I create a vector with these conditions
> >  str_extract(df$Command,paste(v1,collaps='|')) //returning patterns,
> using stringr library
> >
> >here I see no pattern named PDT2 but there are only PD and t2 patterns.
> >On Monday, May 2, 2016 8:18 AM, Tom Wright <tom at maladmin.com> wrote:
> >
> >
> >
> >Sorry for the missed braces earlier. I was typing on a phone, not the
> best place to conjugate regular expressions.
> >Using the example you provided:
> >
> >> df=data.frame(Command=c("_localize_PD", "_localize_tre_t2",
> "_abdomen_t1_seq", "knee_pd_t1_localize", "pd_local_abdomen_t2"))
> >
> >> grepl("(.*t2.*pd.*)|(.*pd.*t2.*)",df$Command)
> >[1] FALSE FALSE FALSE FALSE  TRUE
> >
> >> subset(df,grepl("(.*t2.*pd.*)|(.*pd.*t2.*)",df$Command))
> >              Command
> >5 pd_local_abdomen_t2
> >
> >
> >
> >On Mon, May 2, 2016 at 7:42 AM, <chalabi.elahe at yahoo.de> wrote:
> >
> >Thanks Peter, you were right, the exact grepl is
> grepl("(.*t2.*pd.*)|(.*pd.*t2.*)",df$Command), but it does not change
> anything in Command, when I check the size of it by
> sum(grepl("(.*t2.*pd.*)|(.*pd.*t2.*)",df$Command))  the result is 0, but I
> am sure that the size is not 0. It seems that this AND does not work.
> >>
> >>
> >>
> >>On Monday, May 2, 2016 5:05 AM, peter dalgaard <pdalgd at gmail.com> wrote:
> >>
> >>On 02 May 2016, at 12:43 , ch.elahe via R-help <r-help at r-project.org>
> wrote:
> >>
> >>> Thanks for your reply tom. After using
> Subset(df,grepl("(.*t2.*pd.*)|(.*pd.*t2.*)"),df$Command)  I get this error:
> Argument "x" is missing, with no default. Actually I don't know how to fix
> this. Do you have any idea?
> >>
> >>Tom's code was missing a ")" but not where you put one. He probably also
> didn't intend to capitalize "subset".
> >>
> >>
> >>-pd
> >>
> >>> Thanks,
> >>> Elahe
> >>>
> >>>
> >>> On Saturday, April 30, 2016 7:35 PM, Tom Wright <tom at maladmin.com>
> wrote:
> >>>
> >>>
> >>>
> >>> Actually not sure my previous answer does what you wanted. Using your
> approach:
> >>> t2pd=subset(df,grepl("t2",df$Command) & grepl("pd",df$Command))
> >>> Should work.
> >>> I think the regex pattern you are looking for is:
> >>> Subset(df,grepl("(.* t2.*pd.* )|(.* pd.* t2.*)",df$Command)
> >>>
> >>> On Sat, Apr 30, 2016, 7:07 PM Tom Wright <tom at maladmin.com> wrote:
> >>>
> >>> subset(df,grepl("t2|pd",x$Command))
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Sat, Apr 30, 2016 at 2:38 PM, ch.elahe via R-help <
> r-help at r-project.org> wrote:
> >>>>
> >>>> Hi all,
> >>>>>
> >>>>> I have one factor variable in my df and I want to extract the names
> from it which contain both "t2" and "pd":
> >>>>>
> >>>>> 'data.frame': 36919 obs. of 162 variables
> >>>>>  $TE                :int 38,41,11,52,48,75,.....
> >>>>>  $TR                :int 100,210,548,546,.....
> >>>>>  $Command          :factor W/2229 levels
> "_localize_PD","_localize_tre_t2","_abdomen_t1_seq","knee_pd_t1_localize","pd_local_abdomen_t2"...
> >>>>>
> >>>>> I have tried this but I did not get result:
> >>>>>
> >>>>> t2pd=subset(df,grepl("t2",Command) & grepl("pd",Command))
> >>>>>
> >>>>>
> >>>>> does anyone know how to apply AND in grepl?
> >>>>>
> >>>>> Thanks
> >>>>> Elahe
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>> .
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>--
> >>Peter Dalgaard, Professor,
> >>Center for Statistics, Copenhagen Business School
> >>Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> >>Phone: (+45)38153501
> >>Office: A 4.23
> >>Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
> >>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list