[Rd] Question about Unix file paths

Martin Maechler maechler at stat.math.ethz.ch
Wed Nov 26 13:52:44 MET 2003


>>>>> " Kurt" == Kurt Hornik <Kurt.Hornik at wu-wien.ac.at>
>>>>>     on Wed, 26 Nov 2003 10:05:42 +0100 writes:

>>>>> Prof Brian Ripley writes:
    >> On Mon, 24 Nov 2003, Duncan Murdoch wrote:
    >>> >Duncan Murdoch <dmurdoch at pair.com> writes:
    >>> >
    >>> >> Gabor Grothendieck pointed out a bug to me in
    >>> list.files(..., >> full.name=TRUE), that essentially
    >>> comes down to the fact that in >> Windows it's not
    >>> always valid to add a path separator (slash or >>
    >>> backslash) between a path specifier and a filename.  For
    >>> example,
    >>> >> 
    >>> >> c:foo
    >>> >> 
    >>> >> is different from
    >>> >> 
    >>> >> c:\foo
    >>> >> 
    >>> >> and there are other examples.
    >>> 
    >>> I've committed a change to r-patched to fix this in
    >>> Windows only.  Sounds like it's not an issue elsewhere.

    >> I think there are some potential issues with doubling
    >> separators and final separators on dirs.  On Unix file
    >> systems /part1//part2 and /path/to/dir/ are valid.
    >> However, file systems on Unix may not be Unix file
    >> systems: examples are earlier MacOS systems on MacOS X
    >> and mounted Windows and Novell systems on Linux.  I would
    >> not want to assume that all of these combinations worked.

    >>> Gabor also suggested an option to use shell globbing
    >>> instead of regular expressions to select the files in
    >>> the list, e.g.
    >>> 
    >>> list.files(dir="/", pattern="a*.dat", glob=T)
    >>> 
    >>> This would be easy to do in Windows, but from the little
    >>> I know about Unix programming, would not be so easy
    >>> there, so I haven't done anything about it.

    >> It would be shell-dependent and OS-dependent as well as a
    >> retrograde step, as those who wanted to use regular
    >> expressions no longer would be able to.

     Kurt> Right.  In any case, an explicit glob() function
     Kurt> seems preferable to me ...

Good idea!

More than 12 years ago, I had a similar one, and wrote  a
"pat2grep()" {pattern to grep regular expression} function
--- for S-plus on Unix ---  which I have now renamed to  glob2regexp():
-- still not really usable outside unix (or windows with the
'sed' tool in the path), nor perfect, but maybe a good start:

sys <- function(...) system(paste(..., sep = ""))

glob2regexp <- function(pattern)
{
  ## Purpose: Change "ls pattern" to "grep regular expression" pattern.
  ## -------------------------------------------------------------------------
  ## Author: Martin Maechler ETH Zurich, ~ 1991
  sys("echo '", pattern, "'| sed ",
      "'s/\\./\\\\./g;s/*/.*/g;s/?/./g; s/^/^/;s/$/$/; s/\\.\\*\\$$//'")
}

E.g.,

  > glob2regexp("a*.dat")
  ^a.*\.dat$

  > pat2grep("a?bc*.t??")
  ^a.bc.*\.t..$

and one could use it as

     list.files(...., pattern = glob2regexp("a*.dat"))

Of course, the function needs to be changed to simply use things like
sub() and gsub() --- another minor exercise for our audience ...

Martin



More information about the R-devel mailing list