[Rd] readchar() bug or feature? was Re: Clarification for readChar man page

Jeffrey Horner jeff.horner at vanderbilt.edu
Thu Jun 14 23:05:25 CEST 2007


Jeffrey Horner wrote:
> Duncan Murdoch wrote:
>> On 6/14/2007 10:49 AM, Jeffrey Horner wrote:
>>> Hi,
>>>
>>> Here's a patch to the readChar manual page (R-trunk as of today) that 
>>> better clarifies readChar's return value. 
>> Your update is not right.  For example:
>>
>> x <- as.raw(32:96)
>> readChar(x, nchars=rep(2,100))
>>
>> This returns a character vector of length 100, of which the first 32 
>> elements have 2 chars, the next one has 1, and the rest are "".
>>
>> So the length of nchars really does affect the length of the value.
>>
>> Now, I haven't looked at the code, but it's possible we could delete the 
>> "(which might be less than \code{length(nchars)})" remark, and if not, 
>> it would be useful to explain the situations in which the return value 
>> could be shorter than the nchars vector.
> 
> Well, this is rather a misunderstanding on my part; I completely forgot 
> about vectorization. The manual page makes sense to me now.
> 
> But the situation about the return value possibly being less than 
> length(nchars) isn't clear. Consider a 101 byte text file in a 
> non-multibyte character locale:
> 
> f <- tempfile()
> writeChar(paste(rep(seq(0,9),10),collapse=''),con=f)
> 
> and calling readChar() to read 100 bytes with length(nchar)=10:
> 
>  > readChar(f,nchar=rep(10,10))
>   [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>   [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
> 
> and readChar() reading the entire file with length(nchar)=11:
> 
>  > readChar(f,nchar=rep(10,11))
>   [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>   [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
> [11] "\0"
> 
> but the following two outputs are confusing. readchar() with 
> length(nchar)>=12 returns a character vector length 12:
> 
>  > readChar(f,nchar=rep(10,12))
>   [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>   [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
> [11] "\0"         ""
>  > readChar(f,nchar=rep(10,13))
>   [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>   [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
> [11] "\0"         ""
> 
> It seems that the first time EOF is encountered on a read operation, an 
> empty string is returned, but on subsequent reads nothing is returned. 
> Is this intended behavior?

I believe this is an off-by-1 bug in do_readchar(). The following fix to 
R-trunk v41946 causes the above readchar() calls to cap the returned 
vector length at 11:

Index: src/main/connections.c
===================================================================
--- src/main/connections.c      (revision 41946)
+++ src/main/connections.c      (working copy)
@@ -3286,7 +3286,7 @@
             if(!con->open(con)) error(_("cannot open the connection"));
      }
      PROTECT(ans = allocVector(STRSXP, n));
-    for(i = 0, m = i+1; i < n; i++) {
+    for(i = 0, m = 0; i < n; i++) {
         len = INTEGER(nchars)[i];
         if(len == NA_INTEGER || len < 0)
             error(_("invalid value for '%s'"), "nchar");


Jeff

> 
> Jeff
> 
>> Duncan Murdoch
>>
>>
>> It could use some work as I'd
>>> also like to add some text about using nchar() to find the length of 
>>> the string that readchar() returns, but I'm unsure which of 
>>> type="bytes" or type="chars" to mention. Is it type="chars"?
>>>
>>> Index: src/library/base/man/readChar.Rd
>>> ===================================================================
>>> --- src/library/base/man/readChar.Rd    (revision 41943)
>>> +++ src/library/base/man/readChar.Rd    (working copy)
>>> @@ -57,8 +57,8 @@
>>>   }
>>>
>>>   \value{
>>> -  For \code{readChar}, a character vector of length the number of
>>> -  items read (which might be less than \code{length(nchars)}).
>>> +  For \code{readChar}, a character vector of length 1 with the number
>>> +  of characters less than or equal to nchars.
>>>
>>>     For \code{writeChar}, a raw vector (if \code{con} is a raw vector) or
>>>     invisibly \code{NULL}.
>>>
>>>
>>> Jeff
> 
> 


-- 
http://biostat.mc.vanderbilt.edu/JeffreyHorner



More information about the R-devel mailing list