[Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

Tue May 22 15:13:05 CEST 2018

Fixed in R-devel 74754.
Tomas

On 04/19/2018 12:15 PM, Tomas Kalibera wrote:
> On 04/19/2018 11:47 AM, Serguei Sokol wrote:
>> Le 19/04/2018 à 09:30, Tomas Kalibera a écrit :
>>> On 04/19/2018 02:06 AM, Duncan Murdoch wrote:
>>>> On 18/04/2018 5:08 PM, Tousey, Colton wrote:
>>>>> Hello,
>>>>>
>>>>> I want to report a bug in R that is limiting my capabilities to 
>>>>> export a matrix with write.csv or write.table with over 
>>>>> 2,147,483,648 elements (C's int limit). I found this bug already 
>>>>> reported about before: 
>>>>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17182. 
>>>>> However, there appears to be no solution or fixes in upcoming R 
>>>>> version releases.
>>>>>
>>>>> The error message is coming from the writetable part of the utils 
>>>>> package in the io.c source 
>>>>> code(https://svn.r-project.org/R/trunk/src/library/utils/src/io.c):
>>>>> /* quick integrity check */
>>>>>                  if(XLENGTH(x) != (R_len_t)nr * nc)
>>>>>                      error(_("corrupt matrix -- dims not not match 
>>>>> length"));
>>>>>
>>>>> The issue is that nr*nc is an integer and the size of my matrix, 
>>>>> 2.8 billion elements, exceeds C's limit, so the check forces the 
>>>>> code to fail.
>>>>
>>>> Yes, looks like a typo:  R_len_t is an int, and that's how nr was 
>>>> declared.  It should be R_xlen_t, which is bigger on machines that 
>>>> support big vectors.
>>>>
>>>> I haven't tested the change; there may be something else in that 
>>>> function that assumes short vectors.
>>> Indeed, I think the function won't work for long vectors because of 
>>> EncodeElement2 and EncodeElement0. EncodeElement2/0 would have to be 
>>> changed, including their signatures
>>
>> That would be a definite fix but before such deep rewriting is 
>> undertaken may the following small fix (in addition to "(R_xlen_t)nr 
>> * nc") will be sufficient for cases where nr and nc are in int range 
>> but their product can reach long vector limit:
>>
>> replace
>>     tmp = EncodeElement2(x, i + j*nr, quote_col[j], qmethod,
>>                     &strBuf, sdec);
>> by
>>     tmp = EncodeElement2(VECTOR_ELT(x, (R_xlen_t)i + j*nr), 0, 
>> quote_col[j], qmethod,
>>                     &strBuf, sdec);
>
> Unfortunately we can't do that, x is a matrix of an atomic vector 
> type. VECTOR_ELT is taking elements of a generic vector, so it cannot 
> be applied to "x". But even if we extracted a single element from "x" 
> (e.g. via a type-switch etc), we would not be able to pass it to 
> EncodeElement0 which expects a full atomic vector (that is, including 
> its header). Instead we would have to call functions like 
> EncodeInteger, EncodeReal0, etc on the individual elements. Which is 
> then the same as changing EncodeElement0 or implementing a new version 
> of it. This does not seem that hard to fix, just is not as trivial as 
> changing the cast..
>
> Tomas
>
>
>>
>> Serguei
>
>