[Rd] should Sys.glob() cope with a UNC windows path beginning with backslashes?

Tony Plate tplate at acm.org
Mon Jun 29 23:47:41 CEST 2009


Prof Brian Ripley wrote:
> On Fri, 26 Jun 2009, Tony Plate wrote:
>
>> I find that Sys.glob() doesn't like UNC paths where the initial 
>> slashes are backslashes.  The help page for Sys.glob() doesn't 
>> specificly mention UNC paths, but does say: "File paths in Windows 
>> are interpreted with separator \ or /."  Is the failure to treat a 
>> path beginning with a double-backslash as a UNC network drive path 
>> the intended behavior?
>
> Yes.  There are general warnings about non-POSIX Windows paths in 
> several of the help files.
>
> The following comments should alert you to possible restrictions:
>
>   The \code{glob} system call is not part of Windows, and we supply an
>   emulation.
>
>   File paths in Windows are interpreted with separator \code{\\} or
>   \code{/}.  Paths with a drive but relative (such as \code{c:foo\\bar})
>   are tricky, but an attempt is made to handle them correctly.
>
> If you want to submit a well-tested patch, it will be considered.
The problem seems to be in the function dos_wglob() in 
src/gnuwin32/dos_wglob.c.  This function treats backslashes as a escape 
characters when they precede one of the metacharacters []-{}~\.  So, an 
initial double backslash is changed to an initial single backslash. 
Consequently, the existing code does see network drives when the prefix 
is 3 or 4 backslashes.

Here's a patch that adds special treatment for a prefix of exactly two 
backslashes so that Sys.glob() sees a network drive in this case:

/cygdrive/c/Rbuild/R-2.9.1/src/gnuwin32
$ diff -c dos_wglob.c~ dos_wglob.c
*** dos_wglob.c~        Sun Sep 21 16:05:28 2008
--- dos_wglob.c Mon Jun 29 12:09:47 2009
***************
*** 203,208 ****
--- 203,222 ----
        *bufnext++ = BG_SEP;
        patnext += 2;
      }
+     /* Hack to treat UNC network drive specification correctly:
+      * Without this code, '\\' (i.e., literally two backslashes in 
pattern)
+      * at the beginning of a path is not recognized as a network drive,
+      * because the GLOB_QUOTE loop below changes the two backslashes 
to one.
+      * So, in the case where there are two but not three backslashes at
+      * the beginning of the path, transfer these to the output.
+      */
+     if (patnext == pattern && bufend - bufnext > 2 &&
+       pattern[0] == BG_SEP2 && pattern[1] == BG_SEP2 &&
+       pattern[2] != BG_SEP2) {
+       *bufnext++ = pattern[0];
+       *bufnext++ = pattern[1];
+       patnext += 2;
+     }
  #endif

      if (flags & GLOB_QUOTE) {

*** end of patch

This changes behavior in the just the case where the prefix is two 
backslashes.  With the fix, the behavior is:
 > Sys.glob("\\jacona\\home\\tplate")
character(0)
 > Sys.glob("\\\\jacona\\home\\tplate")
[1] "\\\\jacona\\home\\tplate"
 > Sys.glob("\\\\\\jacona\\home\\tplate")
[1] "\\\\jacona\\home\\tplate"
 > Sys.glob("\\\\\\\\jacona\\home\\tplate")
[1] "\\\\jacona\\home\\tplate"

Without the fix, the behavior is:
 > Sys.glob("\\jacona\\home\\tplate")
character(0)
 > Sys.glob("\\\\jacona\\home\\tplate")
character(0)
 > Sys.glob("\\\\\\jacona\\home\\tplate")
[1] "\\\\jacona\\home\\tplate"
 > Sys.glob("\\\\\\\\jacona\\home\\tplate")
[1] "\\\\jacona\\home\\tplate"


Here is a corresponding change to the docs:

tplate at oberon /cygdrive/c/Rbuild/R-2.9.1/src/library/base/man
*** Sys.glob.Rd~        Thu Mar 19 17:05:24 2009
--- Sys.glob.Rd Mon Jun 29 13:52:57 2009
***************
*** 89,94 ****
--- 89,104 ----
    File paths in Windows are interpreted with separator \code{\\} or
    \code{/}.  Paths with a drive but relative (such as \code{c:foo\\bar})
    are tricky, but an attempt is made to handle them correctly.
+   Backslashes in paths are tricky because they can serve dual purposes:
+   meta-function remover and path separator.  As a result, single or
+   double backslashes can serve as path separators.  UNC network drive
+   paths specified with backslashes (such as \code{\\\\foo\\bar}) are
+   treated specially so that the network drive is found when the path
+   begins with two, three, or four backslashes (i.e., paths beginning
+   with \code{\\\\foo\\bar}, \code{\\\\\\foo\\bar}, and
+   \code{\\\\\\foo\\bar} all result in the same output).  UNC network
+   drive paths can also be specified with two forward slashes.
+
  #endif
  }
  \value{
***************
*** 117,122 ****
--- 127,138 ----
  \examples{
  \dontrun{
  Sys.glob(file.path(R.home(), "library", "*", "R", "*.rdx"))
+ # different ways of seeing the same network drive
+ Sys.glob("\\\\\\\\foo\\\\bar")
+ Sys.glob("\\\\\\\\foo\\\\\\\\bar")
+ Sys.glob("\\\\\\\\\\\\foo\\\\\\\\bar")
+ Sys.glob("\\\\\\\\\\\\\\\\foo\\\\\\\\bar")
+ Sys.glob("//foo/bar")
  }}
  \keyword{utilities}
  \keyword{file}

*** end of patch

R compiled with this fix passes 'make check-all' (or at least all the 
differences and warnings printed appear to be minor and unrelated to 
this change.)

I suspect that it is a matter of taste whether or not this "fix" is 
desirable.

The argument for it is that it helps Sys.glob() recognize standard UNC 
network drive specifications, which can begin with a double backslash.  
Paths of this form can be returned by some system calls, e.g., getwd().

The argument against it would be that it is more important that 
Sys.glob() consistently treats backslashes as an escape mechanism when 
preceding any of []-{}~\.  If the latter argument is more forceful, then 
this should be documented in the help page for Sys.glob(), and callers 
of Sys.glob() should be careful not to pass it a UNC double-backslash 
prefix, e.g., as can returned by getwd() (an example of passing a path 
with UNC double-backslash prefix can occur in tools:::.writePkgIndices).

-- Tony Plate
>
>>
>> E.g., on a Windows system where \\foo is a network drive and 
>> \\foo\bar exists, I see:
>>
>>> Sys.glob("//foo/bar")
>> [1] "//foo/bar"
>>> Sys.glob("//foo\\bar")
>> [1] "//foo\\bar"
>>> Sys.glob("\\\\foo/bar")
>> character(0)
>>> Sys.glob("\\\\foo\\bar")
>>>
>> (the pattern of behavior seems to be that initial backslashes are not 
>> equivalent to forward slashes, but later backslashes are.)
>>
>> This is not a big deal, but I noticed it because it results in Rcmd 
>> check giving a spurious warning when started from a cygwin shell with 
>> a working directory that is a network drive specified as a UNC path.  
>> This happens because mandir in tools:::.writePkgIndices has the form 
>> \\foo/bar/R/packages/mypkg/man, which results in the false warning 
>> "there is a 'man' dir but no help pages in this package."  A simple 
>> workaround was to use a drive-letter mount for the network drive.
>>
>>> sessionInfo()
>> R version 2.9.1 (2009-06-26) i386-pc-mingw32 locale:
>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
>> States.1252;LC_MONETARY=English_United 
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>
>> -- Tony Plate
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>



More information about the R-devel mailing list