[Rd] problems with truncate() with files > 2Gb under Windows (PR#7880)

ripley at stats.ox.ac.uk ripley at stats.ox.ac.uk
Thu May 19 18:47:48 CEST 2005


__USE_LARGEFILE is a standard Unix way to allow > 2Gb files on 32-bit 
OSes by using f{seek,tell}o  Take a look at the definition of f_tell:

#if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE)
#define f_seek fseeko
#define f_tell ftello
#else
#ifdef Win32
#define f_seek fseeko64
#define f_tell ftello64
#else
#define f_seek fseek
#define f_tell ftell
#endif
#endif

Windows support for > 2Gb files seemed flaky, but we did not think it was 
R's job to report OS deficiencies.

I've now used off64_t in file_seek under Windows.

On Thu, 19 May 2005 tplate at blackmesacapital.com wrote:

> This message relates to handling files > 2Gb under Windows.  (I use 2Gb
> as shorthand for 2^31-1 -- the largest integer representable in a signed
> 32 bit integer.)
>
> First issue: truncate() is not able to successfully  truncate files at a
> position > 2Gb.  This appears to be due to the use of the Windows
> function chsize() in file_truncate() in main/connections.c (chsize()
> takes a long int specification of the file size, so we would not expect
> it to work for positions > 2Gb).
>
> The Windows API has the function SetEndOfFile(handle) that is
> supposed to truncate the file to the current position.  However, this
> function does not seem to function correctly when the current position
> is beyond 2Gb, so it is not improvement on chsize() (at least under
> Windows 2000).  My explorations with Windows 2000 SP2 and XP Prof SP1
> indicate that SetEndOfFile() DOES successfully truncate files > 2Gb to
> sizes < 2Gb, but cannot truncate the same file to a position beyond 2Gb.
>  So I have no suggestions on how to get this to work.  Probably, the
> best thing to do would be to stop with in error in the appropriate
> situations.
>
> Second issue: although the R function seek() can take a seek position
> specified as a double, which allows it to seek to a position beyond 2Gb,
> the return value from seek() appears to be a 32-bit signed integer,
> resulting in strange (incorrect) return values from seek(), though
> otherwise not affecting correct operation.
>
> Inspecting the code, I wonder whether the lines
>
> #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE)
>     off_t pos = f_tell(fp);
> #else
>     long pos = f_tell(fp);
> #endif
>
> in the definition of file_seek() in main/connections.c should be more
> along the lines of the code defining struct fileconn in
> include/Rconnections.h:
>
> #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE)
>     off_t rpos, wpos;
> #else
> #ifdef Win32
>     off64_t rpos, wpos;
> #else
>     long rpos, wpos;
> #endif
> #endif
>
> I compiled and tested a version of R devel 2.2.0 with the appropriate
> simple change to file_seek() in main/connections.c, and with it, seek()
> correctly returned file positions beyond 2Gb.  However,  I don't know
> the purpose of the #define __USE_LARGEFILE (and I couldn't find any info
> about googling about it on r-project.org), so I'm hesitant to offer a
> patch.  Here's the new block of code I used in main/connections.c that
> worked ok under Windows :
>
> #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE)
>     off_t pos = f_tell(fp);
> #else
> #ifdef Win32
>     off64_t pos = f_tell(fp);
> #else
>     long pos = f_tell(fp);
> #endif
> #endif
>
> I'll be happy to submit a patch that addresses these issues, if someone
> will explain the usage and purpose of __USE_LARGEFILE.
>
> The following transcript, which illustrates both issues (without my
> mods), was created from an installation based on the precompiled version
> of R for Windows. (rw2010.exe).
>
> -- Tony Plate
>
>> options(digits=15)
>>
>> # can truncate a short file from 8 bytes to 4 bytes
>> # first create a file with 8 bytes
>> f <- file("tmp1.txt", "wb")
>> writeLines(c("abc", "def"), f)
>> close(f)
>> # check length then truncate to 4 bytes
>> f <- file("tmp1.txt", "r+b")
>> seek(f, 0, "end")
> [1] 0
>> seek(f, NA)
> [1] 8
>> seek(f, 4)
> [1] 8
>> truncate(f)
> NULL
>> seek(f, 0, "end")
> [1] 4
>> seek(f, NA)
> [1] 4
>> close(f)
>> # can truncate a long file from 2000000008 bytes to 2000000004 bytes
>> # first create a file with 2000000008 bytes (slightly < 2^31)
>> f <- file("tmp1.txt", "wb")
>> seek(f, 2000000000)
> [1] 0
>> writeLines(c("abc", "def"), f)
>> close(f)
>> f <- file("tmp1.txt", "r+b")
>> seek(f, 0, "end")
> [1] 0
>> seek(f, NA)
> [1] 2000000008
>> seek(f, 2000000004)
> [1] 2000000008
>> truncate(f)
> NULL
>> seek(f, 0, "end")
> [1] 2000000004
>> seek(f, NA)
> [1] 2000000004
>> close(f)
>> # cannot truncate a long file from 2200000008 bytes to 2200000004 bytes
>> # first create a file with 2200000008 bytes (slightly > 2^31)
>> f <- file("tmp1.txt", "wb")
>> seek(f, 2200000000)
> [1] 0
>> writeLines(c("abc", "def"), f)
>> close(f)
>> f <- file("tmp1.txt", "r+b")
>> seek(f, 0, "end")
> [1] 0
>> seek(f, NA) # bad reported value of the current position of "2200000008"
> [1] -2094967288
>> 2200000008 - 2^32
> [1] -2094967288
>> seek(f, 2200000004)
> [1] -2094967288
>> truncate(f) # doesn't work!
> NULL
>> seek(f, 0, "end")
> [1] -2094967288
>> # see if we successfully truncated... (no -- same length as before
>> # can also verify this by watching file size with 'ls -l')
>> seek(f, NA) # file is same size as before the attempted truncation
> [1] -2094967288
>> close(f)
>> version
>          _
> platform i386-pc-mingw32
> arch     i386
> os       mingw32
> system   i386, mingw32
> status
> major    2
> minor    1.0
> year     2005
> month    04
> day      18
> language R
>>
>
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list