[R] "Special" characters in URI

Henrik Bengtsson hb at maths.lth.se
Tue May 3 15:20:31 CEST 2005

Gregor GORJANC wrote:
> Henrik Bengtsson wrote:
>>Gregor GORJANC wrote:
> ...
>>>What do you think about this scratch, which afcourse doesn't solve all
>>>"special" characters:
>>>fixURLchar <- function(URL,
>>>                       from = c(" ", "\"", ",", "#"),
>>>                       to = c("%20", "%22", "%2c", "%23"))
>>Just a comment. It is much safer/easier to use named vectors for
>>mapping, e.g.
>> map <- c(" "="%20", "\""="%22", ","="%2c", "#"="%23")
> ...
> Henrik, thanks. So you suggest something like
> for (i in seq(along=map)) {
>     URL <- gsub(pattern=names(map)[i], replacement=map[i], x=URL)
> }

Yes, something like that. To optimize, you might want to do

patterns <- names(map);
for (i in seq(along=map)) {
   URL <- gsub(pattern=patterns[i], replacement=map[i], x=URL)

More important is that you treat a standard "%" different from a "%" 
used in encoding, e.g. how do you want to convert the string "100% %20"? 
You probably have to utilize more "fancy" regular expressions to detect 
a standard "%". Maybe "%[^0-9a-fA-F]" will do. There should be much more 
details in the document Brian Ripley refered you to.

In other words, you have to be careful and try to think through all 
cases you function may be called. A good test is to call it twice, once 
on your original string and the on the escaped on; you should get the 
same result. It depends how complete you want your function to be.

Good luck


More information about the R-help mailing list