[R] "Special" characters in URI
Henrik Bengtsson
hb at maths.lth.se
Tue May 3 15:20:31 CEST 2005
Gregor GORJANC wrote:
> Henrik Bengtsson wrote:
>
>>Gregor GORJANC wrote:
>
> ...
>
>>>What do you think about this scratch, which afcourse doesn't solve all
>>>"special" characters:
>>>
>>>fixURLchar <- function(URL,
>>> from = c(" ", "\"", ",", "#"),
>>> to = c("%20", "%22", "%2c", "%23"))
>>
>>
>>Just a comment. It is much safer/easier to use named vectors for
>>mapping, e.g.
>>
>> map <- c(" "="%20", "\""="%22", ","="%2c", "#"="%23")
>>
>
> ...
>
> Henrik, thanks. So you suggest something like
>
> for (i in seq(along=map)) {
> URL <- gsub(pattern=names(map)[i], replacement=map[i], x=URL)
> }
>
Yes, something like that. To optimize, you might want to do
patterns <- names(map);
for (i in seq(along=map)) {
URL <- gsub(pattern=patterns[i], replacement=map[i], x=URL)
}
More important is that you treat a standard "%" different from a "%"
used in encoding, e.g. how do you want to convert the string "100% %20"?
You probably have to utilize more "fancy" regular expressions to detect
a standard "%". Maybe "%[^0-9a-fA-F]" will do. There should be much more
details in the document Brian Ripley refered you to.
In other words, you have to be careful and try to think through all
cases you function may be called. A good test is to call it twice, once
on your original string and the on the escaped on; you should get the
same result. It depends how complete you want your function to be.
Good luck
Henrik
More information about the R-help
mailing list