[Rd] Question about Unix file paths
Martin Maechler
maechler at stat.math.ethz.ch
Wed Nov 26 13:52:44 MET 2003
>>>>> " Kurt" == Kurt Hornik <Kurt.Hornik at wu-wien.ac.at>
>>>>> on Wed, 26 Nov 2003 10:05:42 +0100 writes:
>>>>> Prof Brian Ripley writes:
>> On Mon, 24 Nov 2003, Duncan Murdoch wrote:
>>> >Duncan Murdoch <dmurdoch at pair.com> writes:
>>> >
>>> >> Gabor Grothendieck pointed out a bug to me in
>>> list.files(..., >> full.name=TRUE), that essentially
>>> comes down to the fact that in >> Windows it's not
>>> always valid to add a path separator (slash or >>
>>> backslash) between a path specifier and a filename. For
>>> example,
>>> >>
>>> >> c:foo
>>> >>
>>> >> is different from
>>> >>
>>> >> c:\foo
>>> >>
>>> >> and there are other examples.
>>>
>>> I've committed a change to r-patched to fix this in
>>> Windows only. Sounds like it's not an issue elsewhere.
>> I think there are some potential issues with doubling
>> separators and final separators on dirs. On Unix file
>> systems /part1//part2 and /path/to/dir/ are valid.
>> However, file systems on Unix may not be Unix file
>> systems: examples are earlier MacOS systems on MacOS X
>> and mounted Windows and Novell systems on Linux. I would
>> not want to assume that all of these combinations worked.
>>> Gabor also suggested an option to use shell globbing
>>> instead of regular expressions to select the files in
>>> the list, e.g.
>>>
>>> list.files(dir="/", pattern="a*.dat", glob=T)
>>>
>>> This would be easy to do in Windows, but from the little
>>> I know about Unix programming, would not be so easy
>>> there, so I haven't done anything about it.
>> It would be shell-dependent and OS-dependent as well as a
>> retrograde step, as those who wanted to use regular
>> expressions no longer would be able to.
Kurt> Right. In any case, an explicit glob() function
Kurt> seems preferable to me ...
Good idea!
More than 12 years ago, I had a similar one, and wrote a
"pat2grep()" {pattern to grep regular expression} function
--- for S-plus on Unix --- which I have now renamed to glob2regexp():
-- still not really usable outside unix (or windows with the
'sed' tool in the path), nor perfect, but maybe a good start:
sys <- function(...) system(paste(..., sep = ""))
glob2regexp <- function(pattern)
{
## Purpose: Change "ls pattern" to "grep regular expression" pattern.
## -------------------------------------------------------------------------
## Author: Martin Maechler ETH Zurich, ~ 1991
sys("echo '", pattern, "'| sed ",
"'s/\\./\\\\./g;s/*/.*/g;s/?/./g; s/^/^/;s/$/$/; s/\\.\\*\\$$//'")
}
E.g.,
> glob2regexp("a*.dat")
^a.*\.dat$
> pat2grep("a?bc*.t??")
^a.bc.*\.t..$
and one could use it as
list.files(...., pattern = glob2regexp("a*.dat"))
Of course, the function needs to be changed to simply use things like
sub() and gsub() --- another minor exercise for our audience ...
Martin
More information about the R-devel
mailing list