[R] glob2rx() {was: no bug in R2.1.0's list.files()}

Gabor Grothendieck ggrothendieck at gmail.com
Thu May 12 20:32:04 CEST 2005


I think glob2rx is of sufficient interest and sufficiently small
that it would be nice to have in the core of R without having to 
install and load sfsmisc.

On 5/12/05, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
> >>>>> "BaRow" == Barry Rowlingson <B.Rowlingson at lancaster.ac.uk>
> >>>>>     on Thu, 12 May 2005 11:05:43 +0100 writes:
> 
>    BaRow> Uwe Ligges wrote:
>    >> Please read about regular expressions (!!!) and try to
>    >> understand that ".txt" also finds "Not_a_txt_file.xls"
>    >> ....
> 
>    BaRow>   The confusion here is between regular expressions
>    BaRow> and wildcard expansion known as 'globbing'. The two
>    BaRow> things are very different, and use characters such as
>    BaRow> '*' '.' and '?' in different ways.
> 
> Exactly,  I had devised  a  "glob" to "regexp" function many
> years ago in order to help newbies make the transition.
> 
> That function, nowadays, called 'glob2rx' has been part of our
> (CRAN) package "sfsmisc" and hence available to all via
> 
>       install.packages("sfsmisc")
>       library("sfsmisc")
> 
> But it's quite simple (though not trivial to read for the
> inexperienced because of the many escapes ("\") needed)
> and it maybe helpful to see its code on R-help, below.
> Then, this topic has lead me to add 2 (obvious in hindsight)
> logical optional arguments to the function so that it now looks like
> 
> glob2rx <- function(pattern, trim.head = FALSE, trim.tail = TRUE)
> {
>    ## Purpose: Change "ls" aka "wildcard" aka "globbing" _pattern_ to
>    ##        Regular Expression (as in grep, perl, emacs, ...)
>    ## -------------------------------------------------------------------------
>    ## Author: Martin Maechler ETH Zurich, ~ 1991
>    ##         New version using [g]sub() : 2004
>    p <- gsub('\\.','\\\\.', paste('^', pattern, '$', sep=''))
>    p <- gsub('\\?',     '.',  gsub('\\*',  '.*', p))
>    ## these are trimming '.*$' and '^.*' - in most cases only for esthetics
>    if(trim.tail) p <- sub("\\.\\*\\$$", '', p)
>    if(trim.head) p <- sub("\\^\\.\\*",  '', p)
>    p
> }
> 
> So those confused newbies (and DOS long timers!)
> could use
> 
>      list.files(myloc, glob2rx("*.zip"), full=TRUE)
> 
>            ## (yes, make a habit of using 'TRUE', not 'T' ..)
> 
> The current example code, BTW, has
> 
>    stopifnot(glob2rx("abc.*") == "^abc\\.",
>               glob2rx("a?b.*") == "^a.b\\.",
>               glob2rx("a?b.*", trim.tail=FALSE) == "^a.b\\..*$",
>               glob2rx("*.doc") == "^.*\\.doc$",
>               glob2rx("*.doc", trim.head=TRUE) == "\\.doc$",
>               glob2rx("*.t*")  == "^.*\\.t",
>               glob2rx("*.t??") == "^.*\\.t..$"
>     )
> 
> Martin Maechler,
> ETH Zurich
> 
>    BaRow>   There's added confusion when people come from a DOS
>    BaRow> background, where commands did their own thing when
>    BaRow> given '*' as parameter. The DOS command:
> 
>    BaRow>   RENAME *.FOO *.BAR
> 
>    BaRow>   did what seems obvious, renaming all the .FOO files
>    BaRow> to .BAR, but on a unix machine doing this with 'mv'
>    BaRow> can be destructive!
> 
>    BaRow>   In short (and slightly simplified), a '*' when
>    BaRow> expanded as a wildcard in a glob matches any string,
>    BaRow> whereas a '*' in a regular expression (regexp),
>    BaRow> matches the previous character 0 or more times. This
>    BaRow> is why "*.zip" is flagged as invalid now - there's no
>    BaRow> character before the "*".
> 
>    BaRow>   That should be enough clues to send you on your
>    BaRow> way.
> 
>    BaRow>   Baz
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list