[Rd] problems with truncate() with files > 2Gb under Windows (possibly (PR#7879)

Tony Plate tplate at blackmesacapital.com
Fri May 20 17:03:12 CEST 2005


A colleague tells me they are able to get the Windows function 
SetEndOfFile to truncate files > 2Gb correctly when they avoid the use 
of FILE functions (fopen etc) entirely and work solely with pure Windows 
functions.  I can look into this and see if something can be put into 
the Windows code in main/connections.c:file_truncate().  (I already 
tried just putting an fflush() in there, but that did not help.)

-- Tony Plate

Prof Brian Ripley wrote:
> On Fri, 20 May 2005, Prof Brian Ripley wrote:
> 
>> To follow up on the truncate() part of this, Windows does not use 
>> chsize directly any more, but ftruncate like all other platforms.  
>> However, truncate() was limited to files < 2Gb on all platforms.  I 
>> have changed the latter and your example now works both on 32-bit 
>> Windows and on 64-bit Linux.
> 
> 
> I am sorry, I should clarify `works', as I had not cross-checked well 
> enough.  There are no errors generated by the C call but ftruncate 
> appears not to do what is required of it, despite having off_t as an 
> argument. That is, at R level everything appears fine but the file was 
> not truncated when looked at outside R.
> 
> Not much we can do about broken OS calls.
> 
>>
>> On Thu, 19 May 2005 tplate at blackmesacapital.com wrote:
>>
>>> This message relates to handling files > 2Gb under Windows.  (I use 2Gb
>>> as shorthand for 2^31-1 -- the largest integer representable in a signed
>>> 32 bit integer.)
>>>
>>> First issue: truncate() is not able to successfully  truncate files at a
>>> position > 2Gb.  This appears to be due to the use of the Windows
>>> function chsize() in file_truncate() in main/connections.c (chsize()
>>> takes a long int specification of the file size, so we would not expect
>>> it to work for positions > 2Gb).
>>>
>>> The Windows API has the function SetEndOfFile(handle) that is
>>> supposed to truncate the file to the current position.  However, this
>>> function does not seem to function correctly when the current position
>>> is beyond 2Gb, so it is not improvement on chsize() (at least under
>>> Windows 2000).  My explorations with Windows 2000 SP2 and XP Prof SP1
>>> indicate that SetEndOfFile() DOES successfully truncate files > 2Gb to
>>> sizes < 2Gb, but cannot truncate the same file to a position beyond 2Gb.
>>>  So I have no suggestions on how to get this to work.  Probably, the
>>> best thing to do would be to stop with in error in the appropriate
>>> situations.
>>>
>>> Second issue: although the R function seek() can take a seek position
>>> specified as a double, which allows it to seek to a position beyond 2Gb,
>>> the return value from seek() appears to be a 32-bit signed integer,
>>> resulting in strange (incorrect) return values from seek(), though
>>> otherwise not affecting correct operation.
>>>
>>> Inspecting the code, I wonder whether the lines
>>>
>>> #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE)
>>>     off_t pos = f_tell(fp);
>>> #else
>>>     long pos = f_tell(fp);
>>> #endif
>>>
>>> in the definition of file_seek() in main/connections.c should be more
>>> along the lines of the code defining struct fileconn in
>>> include/Rconnections.h:
>>>
>>> #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE)
>>>     off_t rpos, wpos;
>>> #else
>>> #ifdef Win32
>>>     off64_t rpos, wpos;
>>> #else
>>>     long rpos, wpos;
>>> #endif
>>> #endif
>>>
>>> I compiled and tested a version of R devel 2.2.0 with the appropriate
>>> simple change to file_seek() in main/connections.c, and with it, seek()
>>> correctly returned file positions beyond 2Gb.  However,  I don't know
>>> the purpose of the #define __USE_LARGEFILE (and I couldn't find any info
>>> about googling about it on r-project.org), so I'm hesitant to offer a
>>> patch.  Here's the new block of code I used in main/connections.c that
>>> worked ok under Windows :
>>>
>>> #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE)
>>>     off_t pos = f_tell(fp);
>>> #else
>>> #ifdef Win32
>>>     off64_t pos = f_tell(fp);
>>> #else
>>>     long pos = f_tell(fp);
>>> #endif
>>> #endif
>>>
>>> I'll be happy to submit a patch that addresses these issues, if someone
>>> will explain the usage and purpose of __USE_LARGEFILE.
>>>
>>> The following transcript, which illustrates both issues (without my
>>> mods), was created from an installation based on the precompiled version
>>> of R for Windows. (rw2010.exe).
>>>
>>> -- Tony Plate
>>>
>>>> options(digits=15)
>>>>
>>>> # can truncate a short file from 8 bytes to 4 bytes
>>>> # first create a file with 8 bytes
>>>> f <- file("tmp1.txt", "wb")
>>>> writeLines(c("abc", "def"), f)
>>>> close(f)
>>>> # check length then truncate to 4 bytes
>>>> f <- file("tmp1.txt", "r+b")
>>>> seek(f, 0, "end")
>>>
>>> [1] 0
>>>
>>>> seek(f, NA)
>>>
>>> [1] 8
>>>
>>>> seek(f, 4)
>>>
>>> [1] 8
>>>
>>>> truncate(f)
>>>
>>> NULL
>>>
>>>> seek(f, 0, "end")
>>>
>>> [1] 4
>>>
>>>> seek(f, NA)
>>>
>>> [1] 4
>>>
>>>> close(f)
>>>> # can truncate a long file from 2000000008 bytes to 2000000004 bytes
>>>> # first create a file with 2000000008 bytes (slightly < 2^31)
>>>> f <- file("tmp1.txt", "wb")
>>>> seek(f, 2000000000)
>>>
>>> [1] 0
>>>
>>>> writeLines(c("abc", "def"), f)
>>>> close(f)
>>>> f <- file("tmp1.txt", "r+b")
>>>> seek(f, 0, "end")
>>>
>>> [1] 0
>>>
>>>> seek(f, NA)
>>>
>>> [1] 2000000008
>>>
>>>> seek(f, 2000000004)
>>>
>>> [1] 2000000008
>>>
>>>> truncate(f)
>>>
>>> NULL
>>>
>>>> seek(f, 0, "end")
>>>
>>> [1] 2000000004
>>>
>>>> seek(f, NA)
>>>
>>> [1] 2000000004
>>>
>>>> close(f)
>>>> # cannot truncate a long file from 2200000008 bytes to 2200000004 bytes
>>>> # first create a file with 2200000008 bytes (slightly > 2^31)
>>>> f <- file("tmp1.txt", "wb")
>>>> seek(f, 2200000000)
>>>
>>> [1] 0
>>>
>>>> writeLines(c("abc", "def"), f)
>>>> close(f)
>>>> f <- file("tmp1.txt", "r+b")
>>>> seek(f, 0, "end")
>>>
>>> [1] 0
>>>
>>>> seek(f, NA) # bad reported value of the current position of 
>>>> "2200000008"
>>>
>>> [1] -2094967288
>>>
>>>> 2200000008 - 2^32
>>>
>>> [1] -2094967288
>>>
>>>> seek(f, 2200000004)
>>>
>>> [1] -2094967288
>>>
>>>> truncate(f) # doesn't work!
>>>
>>> NULL
>>>
>>>> seek(f, 0, "end")
>>>
>>> [1] -2094967288
>>>
>>>> # see if we successfully truncated... (no -- same length as before
>>>> # can also verify this by watching file size with 'ls -l')
>>>> seek(f, NA) # file is same size as before the attempted truncation
>>>
>>> [1] -2094967288
>>>
>>>> close(f)
>>>> version
>>>
>>>          _
>>> platform i386-pc-mingw32
>>> arch     i386
>>> os       mingw32
>>> system   i386, mingw32
>>> status
>>> major    2
>>> minor    1.0
>>> year     2005
>>> month    04
>>> day      18
>>> language R
>>>
>>>>
>>>
>>> ______________________________________________
>>> R-devel at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>>
>> -- 
>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford,             Tel:  +44 1865 272861 (self)
>> 1 South Parks Road,                     +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>
>> ______________________________________________
>> R-devel at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>



More information about the R-devel mailing list