[Rd] problems with truncate() with files > 2Gb under Windows (possibly (PR#7879)

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri May 20 11:37:54 CEST 2005


On Fri, 20 May 2005, Prof Brian Ripley wrote:

> To follow up on the truncate() part of this, Windows does not use chsize 
> directly any more, but ftruncate like all other platforms.  However, 
> truncate() was limited to files < 2Gb on all platforms.  I have changed the 
> latter and your example now works both on 32-bit Windows and on 64-bit Linux.

I am sorry, I should clarify `works', as I had not cross-checked well 
enough.  There are no errors generated by the C call but ftruncate appears 
not to do what is required of it, despite having off_t as an argument. 
That is, at R level everything appears fine but the file was not truncated 
when looked at outside R.

Not much we can do about broken OS calls.

>
> On Thu, 19 May 2005 tplate at blackmesacapital.com wrote:
>
>> This message relates to handling files > 2Gb under Windows.  (I use 2Gb
>> as shorthand for 2^31-1 -- the largest integer representable in a signed
>> 32 bit integer.)
>> 
>> First issue: truncate() is not able to successfully  truncate files at a
>> position > 2Gb.  This appears to be due to the use of the Windows
>> function chsize() in file_truncate() in main/connections.c (chsize()
>> takes a long int specification of the file size, so we would not expect
>> it to work for positions > 2Gb).
>> 
>> The Windows API has the function SetEndOfFile(handle) that is
>> supposed to truncate the file to the current position.  However, this
>> function does not seem to function correctly when the current position
>> is beyond 2Gb, so it is not improvement on chsize() (at least under
>> Windows 2000).  My explorations with Windows 2000 SP2 and XP Prof SP1
>> indicate that SetEndOfFile() DOES successfully truncate files > 2Gb to
>> sizes < 2Gb, but cannot truncate the same file to a position beyond 2Gb.
>>  So I have no suggestions on how to get this to work.  Probably, the
>> best thing to do would be to stop with in error in the appropriate
>> situations.
>> 
>> Second issue: although the R function seek() can take a seek position
>> specified as a double, which allows it to seek to a position beyond 2Gb,
>> the return value from seek() appears to be a 32-bit signed integer,
>> resulting in strange (incorrect) return values from seek(), though
>> otherwise not affecting correct operation.
>> 
>> Inspecting the code, I wonder whether the lines
>> 
>> #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE)
>>     off_t pos = f_tell(fp);
>> #else
>>     long pos = f_tell(fp);
>> #endif
>> 
>> in the definition of file_seek() in main/connections.c should be more
>> along the lines of the code defining struct fileconn in
>> include/Rconnections.h:
>> 
>> #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE)
>>     off_t rpos, wpos;
>> #else
>> #ifdef Win32
>>     off64_t rpos, wpos;
>> #else
>>     long rpos, wpos;
>> #endif
>> #endif
>> 
>> I compiled and tested a version of R devel 2.2.0 with the appropriate
>> simple change to file_seek() in main/connections.c, and with it, seek()
>> correctly returned file positions beyond 2Gb.  However,  I don't know
>> the purpose of the #define __USE_LARGEFILE (and I couldn't find any info
>> about googling about it on r-project.org), so I'm hesitant to offer a
>> patch.  Here's the new block of code I used in main/connections.c that
>> worked ok under Windows :
>> 
>> #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE)
>>     off_t pos = f_tell(fp);
>> #else
>> #ifdef Win32
>>     off64_t pos = f_tell(fp);
>> #else
>>     long pos = f_tell(fp);
>> #endif
>> #endif
>> 
>> I'll be happy to submit a patch that addresses these issues, if someone
>> will explain the usage and purpose of __USE_LARGEFILE.
>> 
>> The following transcript, which illustrates both issues (without my
>> mods), was created from an installation based on the precompiled version
>> of R for Windows. (rw2010.exe).
>> 
>> -- Tony Plate
>> 
>>> options(digits=15)
>>> 
>>> # can truncate a short file from 8 bytes to 4 bytes
>>> # first create a file with 8 bytes
>>> f <- file("tmp1.txt", "wb")
>>> writeLines(c("abc", "def"), f)
>>> close(f)
>>> # check length then truncate to 4 bytes
>>> f <- file("tmp1.txt", "r+b")
>>> seek(f, 0, "end")
>> [1] 0
>>> seek(f, NA)
>> [1] 8
>>> seek(f, 4)
>> [1] 8
>>> truncate(f)
>> NULL
>>> seek(f, 0, "end")
>> [1] 4
>>> seek(f, NA)
>> [1] 4
>>> close(f)
>>> # can truncate a long file from 2000000008 bytes to 2000000004 bytes
>>> # first create a file with 2000000008 bytes (slightly < 2^31)
>>> f <- file("tmp1.txt", "wb")
>>> seek(f, 2000000000)
>> [1] 0
>>> writeLines(c("abc", "def"), f)
>>> close(f)
>>> f <- file("tmp1.txt", "r+b")
>>> seek(f, 0, "end")
>> [1] 0
>>> seek(f, NA)
>> [1] 2000000008
>>> seek(f, 2000000004)
>> [1] 2000000008
>>> truncate(f)
>> NULL
>>> seek(f, 0, "end")
>> [1] 2000000004
>>> seek(f, NA)
>> [1] 2000000004
>>> close(f)
>>> # cannot truncate a long file from 2200000008 bytes to 2200000004 bytes
>>> # first create a file with 2200000008 bytes (slightly > 2^31)
>>> f <- file("tmp1.txt", "wb")
>>> seek(f, 2200000000)
>> [1] 0
>>> writeLines(c("abc", "def"), f)
>>> close(f)
>>> f <- file("tmp1.txt", "r+b")
>>> seek(f, 0, "end")
>> [1] 0
>>> seek(f, NA) # bad reported value of the current position of "2200000008"
>> [1] -2094967288
>>> 2200000008 - 2^32
>> [1] -2094967288
>>> seek(f, 2200000004)
>> [1] -2094967288
>>> truncate(f) # doesn't work!
>> NULL
>>> seek(f, 0, "end")
>> [1] -2094967288
>>> # see if we successfully truncated... (no -- same length as before
>>> # can also verify this by watching file size with 'ls -l')
>>> seek(f, NA) # file is same size as before the attempted truncation
>> [1] -2094967288
>>> close(f)
>>> version
>>          _
>> platform i386-pc-mingw32
>> arch     i386
>> os       mingw32
>> system   i386, mingw32
>> status
>> major    2
>> minor    1.0
>> year     2005
>> month    04
>> day      18
>> language R
>>> 
>> 
>> ______________________________________________
>> R-devel at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> 
>
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list