[R] URLdecode problems

Oliver Keyes okeyes at wikimedia.org
Mon Sep 1 18:02:33 CEST 2014


Hey all,

So, I'm attempting to decode some (and I don't know why anyone did this)
URl-encoded user agents. Running URLdecode over them generates the error:

"Error in rawToChar(out) : embedded nul in string"

Okay, so there's an embedded nul - fair enough. Presumably decoding the URL
is exposing it in a format R doesn't like. Except when I try to dig down
and work out what an encoded nul looks like, in order to simply remove them
with something like gsub(), I end up with several different strings, all of
which apparently resolve to an embedded nul:

> URLdecode("0;%20@%gIL")
Error in rawToChar(out) : embedded nul in string: '0; @\0L'
In addition: Warning message:
In URLdecode("0;%20@%gIL") :
  out-of-range values treated as 0 in coercion to raw
> URLdecode("%20%use")
Error in rawToChar(out) : embedded nul in string: ' \0e'
In addition: Warning message:
In URLdecode("%20%use") :
  out-of-range values treated as 0 in coercion to raw

I'm a relative newb to encodings, so maybe the fault is simply in my
understanding of how this should work, but - why are both strings being
read as including nuls, despite having different values? And how would I go
about removing said nuls?

-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

	[[alternative HTML version deleted]]



More information about the R-help mailing list