[R] How to convert "c:\a\b" to "c:/a/b"?
Henrik Bengtsson
hb at maths.lth.se
Mon Jun 27 20:53:58 CEST 2005
Spencer Graves wrote:
> Hi, Henrik:
>
> Several functions, e.g., "grep", "sub", "gsub", and "regexpr",
> have an argument "perl", FALSE by default. Moreover, "?regexp" has a
> section on "Perl Regular Expressions". If you can do it in perl, might
> that transfer to "gsub(..., perl=TRUE)"?
I do not know the details behind the different "dialects" of regular
expressions, but you can _not_ get the R parser to interpret the two
ASCII characters "\n", as the two characters "\" and "n". The R parser
is used when code is read by source() or when expressions are typed at
the R prompt. The parser will always read it as the newline character
(ASCII 10). The results from the parser is then passed to the R enginee.
Thus, you cannot write your program such that it fools the parser,
because your program is evaluated first after the parser. In other
words, there is no way you can get nchar("\n") to equal 2.
Cheers
Henrik
> Thanks,
> spencer graves
> p.s. I skimmed the discussion of "Pearl Regular Expressions", and
> experimented with "gsub(..., perl=TRUE)" without success. However,
> there may be a way to do it, and I just don't know perl and regexp well
> enough to have figured it out in the time available.
>
> Henrik Bengtsson wrote:
>
>> Spencer Graves wrote:
>>
>>> Thanks, Dirk, Gabor, Eric:
>>>
>>> You all provided appropriate solutions for the stated problem.
>>> Sadly, I oversimplified the problem I was trying to solve: I copy a
>>> character string giving a DOS path from MS Windows Explorer into an R
>>> script file, and I get something like the following:
>>>
>>> D:\spencerg\statmtds\R\Rnews
>>>
>>> I want to be able to use this in R with its non-R meaning,
>>> e.g., in readLine, count.fields, read.table, etc., after appending a
>>> file name. Your three solutions all work for my oversimplified toy
>>> example but are inadequate for the problem I really want to solve.
>>
>>
>>
>> Hmmm. It should work as long as you do not source() the file (see
>> below). There are two things to watch out for here.
>>
>> First, you have to be careful with backslashes, that is, a backslash
>> is a single character ('\') in memory, but to be typed at the R
>> prompt, you have to escape it (with a backslash), which is why we type
>> "\\", cf. nchar("\\") == 0. Consider the file foo.txt containing the
>> 28 characters (==28 bytes in plain ASCII format)
>>
>> D:\spencerg\statmtds\R\Rnews
>>
>> You can create such a file in R by
>>
>> > cat(file="foo.txt", "D:\\spencerg\\statmtds\\R\\Rnews")
>> > str(file.info("foo.txt"))
>> `data.frame': 1 obs. of 6 variables:
>> $ size : num 28
>> $ isdir: logi FALSE
>> $ mode :Class 'octmode' int 438
>> $ mtime:'POSIXct', format: chr "2005-06-27 19:14:20"
>> $ ctime:'POSIXct', format: chr "2005-06-27 19:14:20"
>> $ atime:'POSIXct', format: chr "2005-06-27 19:14:20"
>>
>> Re-read it into R:
>> > bfr <- readLines("foo.txt")
>> Warning message:
>> incomplete final line found by readLines on 'foo.txt'
>> > bfr
>> [1] "D:\\spencerg\\statmtds\\R\\Rnews"
>> > cat("bfr='", bfr, "'\n", sep="")
>> bfr='D:\spencerg\statmtds\R\Rnews'
>>
>> Now, convert backslashes to "forwardslashes":
>> bfr2 <- gsub("\\\\", "/", bfr)
>> > bfr2
>> [1] "D:/spencerg/statmtds/R/Rnews"
>> > cat("bfr2='", bfr2, "'\n", sep="")
>> bfr2='D:/spencerg/statmtds/R/Rnews'
>>
>> Second, regular expression patterns have their own escaping rules.
>> This is why the following happens:
>>
>> bfr3 <- gsub("\\", "/", bfr)
>> Error in gsub(pattern, replacement, x, ignore.case, extended, fixed) :
>> invalid regular expression '\'
>>
>> The pattern "\\", which is a single '\' in memory, is passed to
>> gsub(). Then gsub() tries to interpret this single backslash as a
>> pattern, but it is invalid. gsub() uses backslashed to escape some
>> characters in patterns. So, when you think what gsub() needs, this
>> about the characters (bytes) that are really stored in memory, not
>> what you see.
>>
>> A side comment: Wouldn't it be nice if the R parser had an alternative
>> way to quote string such that, say, Perl strings could be used? Example:
>>
>> bfr3 <- gsub("\\\\", "/", bfr)
>> bfr3 <- gsub('\\', "/", bfr)
>>
>> would be equal (if now single quotes wouldn't have been reserved
>> already).
>>
>> Back to your problem: You must not paste the 28 characters into an R
>> script that you source()! If you want to include you pathname (copied
>> from the command prompt), you have to escape each '\' with a '\' to
>> '\\'. Thus, if you use Emacs or another text editor, you pretty much
>> should see '\\' if you want R(!) to interpret this as the single
>> character '\'. Note the difference between using source() and, say,
>> readLines().
>>
>> Hope this helps
>>
>> Henrik
>>
>>
>>> Thanks,
>>> spencer graves
>>>
>>> Gabor Grothendieck wrote:
>>>
>>>
>>>> On 6/27/05, Dirk Eddelbuettel <edd at debian.org> wrote:
>>>>
>>>>
>>>>> On 26 June 2005 at 20:30, Spencer Graves wrote:
>>>>> | How can one convert back slashes to forward slashes, e.g,
>>>>> changing
>>>>> | "c:\a\b" to "c:/a/b"? I tried the following:
>>>>> |
>>>>> | > gsub("\\\\", "/", "c:\a\b")
>>>>> | [1] "c:\a\b"
>>>>>
>>>>> This does work, provided you remember that single backslashed
>>>>> "don't exist"
>>>>> as e.g. \a is a character in itself. So use doubles are you should
>>>>> be fine:
>>>>>
>>>>>
>>>>>
>>>>>> gsub("\\\\", "/", "c:\\a\\b")
>>>>>
>>>>>
>>>>>
>>>>> [1] "c:/a/b"
>>>>>
>>>>
>>>>
>>>> Also, if one finds four backslashes confusing one can avoid the use
>>>> of four via any of these:
>>>>
>>>> gsub("[\\]", "/", "c:\\a\\b")
>>>> gsub("\\", "/", "c:\\a\\b", fixed = TRUE)
>>>> chartr("\\", "/", "c:\\a\\b")
>>>>
>>>> ______________________________________________
>>>> R-help at stat.math.ethz.ch mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide!
>>>> http://www.R-project.org/posting-guide.html
>>>
>>>
>>>
>>>
>>
>
More information about the R-help
mailing list