[R] glob2rx() {was: no bug in R2.1.0's list.files()}
Gabor Grothendieck
ggrothendieck at gmail.com
Thu May 12 20:32:04 CEST 2005
I think glob2rx is of sufficient interest and sufficiently small
that it would be nice to have in the core of R without having to
install and load sfsmisc.
On 5/12/05, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
> >>>>> "BaRow" == Barry Rowlingson <B.Rowlingson at lancaster.ac.uk>
> >>>>> on Thu, 12 May 2005 11:05:43 +0100 writes:
>
> BaRow> Uwe Ligges wrote:
> >> Please read about regular expressions (!!!) and try to
> >> understand that ".txt" also finds "Not_a_txt_file.xls"
> >> ....
>
> BaRow> The confusion here is between regular expressions
> BaRow> and wildcard expansion known as 'globbing'. The two
> BaRow> things are very different, and use characters such as
> BaRow> '*' '.' and '?' in different ways.
>
> Exactly, I had devised a "glob" to "regexp" function many
> years ago in order to help newbies make the transition.
>
> That function, nowadays, called 'glob2rx' has been part of our
> (CRAN) package "sfsmisc" and hence available to all via
>
> install.packages("sfsmisc")
> library("sfsmisc")
>
> But it's quite simple (though not trivial to read for the
> inexperienced because of the many escapes ("\") needed)
> and it maybe helpful to see its code on R-help, below.
> Then, this topic has lead me to add 2 (obvious in hindsight)
> logical optional arguments to the function so that it now looks like
>
> glob2rx <- function(pattern, trim.head = FALSE, trim.tail = TRUE)
> {
> ## Purpose: Change "ls" aka "wildcard" aka "globbing" _pattern_ to
> ## Regular Expression (as in grep, perl, emacs, ...)
> ## -------------------------------------------------------------------------
> ## Author: Martin Maechler ETH Zurich, ~ 1991
> ## New version using [g]sub() : 2004
> p <- gsub('\\.','\\\\.', paste('^', pattern, '$', sep=''))
> p <- gsub('\\?', '.', gsub('\\*', '.*', p))
> ## these are trimming '.*$' and '^.*' - in most cases only for esthetics
> if(trim.tail) p <- sub("\\.\\*\\$$", '', p)
> if(trim.head) p <- sub("\\^\\.\\*", '', p)
> p
> }
>
> So those confused newbies (and DOS long timers!)
> could use
>
> list.files(myloc, glob2rx("*.zip"), full=TRUE)
>
> ## (yes, make a habit of using 'TRUE', not 'T' ..)
>
> The current example code, BTW, has
>
> stopifnot(glob2rx("abc.*") == "^abc\\.",
> glob2rx("a?b.*") == "^a.b\\.",
> glob2rx("a?b.*", trim.tail=FALSE) == "^a.b\\..*$",
> glob2rx("*.doc") == "^.*\\.doc$",
> glob2rx("*.doc", trim.head=TRUE) == "\\.doc$",
> glob2rx("*.t*") == "^.*\\.t",
> glob2rx("*.t??") == "^.*\\.t..$"
> )
>
> Martin Maechler,
> ETH Zurich
>
> BaRow> There's added confusion when people come from a DOS
> BaRow> background, where commands did their own thing when
> BaRow> given '*' as parameter. The DOS command:
>
> BaRow> RENAME *.FOO *.BAR
>
> BaRow> did what seems obvious, renaming all the .FOO files
> BaRow> to .BAR, but on a unix machine doing this with 'mv'
> BaRow> can be destructive!
>
> BaRow> In short (and slightly simplified), a '*' when
> BaRow> expanded as a wildcard in a glob matches any string,
> BaRow> whereas a '*' in a regular expression (regexp),
> BaRow> matches the previous character 0 or more times. This
> BaRow> is why "*.zip" is flagged as invalid now - there's no
> BaRow> character before the "*".
>
> BaRow> That should be enough clues to send you on your
> BaRow> way.
>
> BaRow> Baz
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
More information about the R-help
mailing list