[R] more on paste and bug
Thomas Lumley
tlumley at u.washington.edu
Wed Oct 10 21:45:18 CEST 2001
On 10 Oct 2001, Saikat DebRoy wrote:
> As it happens, I think the problem is in the read.dta code. The relevant
> piece of code is in foreign/src/stataread.c (lines 317-324):
>
> default:
> charlen=INTEGER(types)[j]-STATA_STRINGOFFSET;
> PROTECT(tmp=allocString(charlen+1));
> InStringBinary(fp,charlen,CHAR(tmp));
> CHAR(tmp)[charlen]=0;
> SET_STRING_ELT(VECTOR_ELT(df,j),i,tmp);
> UNPROTECT(1);
> break;
>
> As it happens, in this case the string "A" is written in the file
> as two bytes (I do not not know why) with the second byte being '\0'.
> So the above code creates a CHARSXP of length 3 with last two bytes
> being '\0'.
>
It happens because Stata treats strings as a fixed-length type, padded on
the right with nulls. I didn't realise that R would incorporate trailing
nulls into the string.
It's easily fixed by just reading into a buffer and using strlen before
allocString.
It might be a bug that the LENGTH() of a string can be longer than its
strlen, though
-thomas
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list