[Rd] problems with truncate() with files > 2Gb under Windows
(possibly (PR#7879)
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri May 20 11:37:54 CEST 2005
On Fri, 20 May 2005, Prof Brian Ripley wrote:
> To follow up on the truncate() part of this, Windows does not use chsize
> directly any more, but ftruncate like all other platforms. However,
> truncate() was limited to files < 2Gb on all platforms. I have changed the
> latter and your example now works both on 32-bit Windows and on 64-bit Linux.
I am sorry, I should clarify `works', as I had not cross-checked well
enough. There are no errors generated by the C call but ftruncate appears
not to do what is required of it, despite having off_t as an argument.
That is, at R level everything appears fine but the file was not truncated
when looked at outside R.
Not much we can do about broken OS calls.
>
> On Thu, 19 May 2005 tplate at blackmesacapital.com wrote:
>
>> This message relates to handling files > 2Gb under Windows. (I use 2Gb
>> as shorthand for 2^31-1 -- the largest integer representable in a signed
>> 32 bit integer.)
>>
>> First issue: truncate() is not able to successfully truncate files at a
>> position > 2Gb. This appears to be due to the use of the Windows
>> function chsize() in file_truncate() in main/connections.c (chsize()
>> takes a long int specification of the file size, so we would not expect
>> it to work for positions > 2Gb).
>>
>> The Windows API has the function SetEndOfFile(handle) that is
>> supposed to truncate the file to the current position. However, this
>> function does not seem to function correctly when the current position
>> is beyond 2Gb, so it is not improvement on chsize() (at least under
>> Windows 2000). My explorations with Windows 2000 SP2 and XP Prof SP1
>> indicate that SetEndOfFile() DOES successfully truncate files > 2Gb to
>> sizes < 2Gb, but cannot truncate the same file to a position beyond 2Gb.
>> So I have no suggestions on how to get this to work. Probably, the
>> best thing to do would be to stop with in error in the appropriate
>> situations.
>>
>> Second issue: although the R function seek() can take a seek position
>> specified as a double, which allows it to seek to a position beyond 2Gb,
>> the return value from seek() appears to be a 32-bit signed integer,
>> resulting in strange (incorrect) return values from seek(), though
>> otherwise not affecting correct operation.
>>
>> Inspecting the code, I wonder whether the lines
>>
>> #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE)
>> off_t pos = f_tell(fp);
>> #else
>> long pos = f_tell(fp);
>> #endif
>>
>> in the definition of file_seek() in main/connections.c should be more
>> along the lines of the code defining struct fileconn in
>> include/Rconnections.h:
>>
>> #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE)
>> off_t rpos, wpos;
>> #else
>> #ifdef Win32
>> off64_t rpos, wpos;
>> #else
>> long rpos, wpos;
>> #endif
>> #endif
>>
>> I compiled and tested a version of R devel 2.2.0 with the appropriate
>> simple change to file_seek() in main/connections.c, and with it, seek()
>> correctly returned file positions beyond 2Gb. However, I don't know
>> the purpose of the #define __USE_LARGEFILE (and I couldn't find any info
>> about googling about it on r-project.org), so I'm hesitant to offer a
>> patch. Here's the new block of code I used in main/connections.c that
>> worked ok under Windows :
>>
>> #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE)
>> off_t pos = f_tell(fp);
>> #else
>> #ifdef Win32
>> off64_t pos = f_tell(fp);
>> #else
>> long pos = f_tell(fp);
>> #endif
>> #endif
>>
>> I'll be happy to submit a patch that addresses these issues, if someone
>> will explain the usage and purpose of __USE_LARGEFILE.
>>
>> The following transcript, which illustrates both issues (without my
>> mods), was created from an installation based on the precompiled version
>> of R for Windows. (rw2010.exe).
>>
>> -- Tony Plate
>>
>>> options(digits=15)
>>>
>>> # can truncate a short file from 8 bytes to 4 bytes
>>> # first create a file with 8 bytes
>>> f <- file("tmp1.txt", "wb")
>>> writeLines(c("abc", "def"), f)
>>> close(f)
>>> # check length then truncate to 4 bytes
>>> f <- file("tmp1.txt", "r+b")
>>> seek(f, 0, "end")
>> [1] 0
>>> seek(f, NA)
>> [1] 8
>>> seek(f, 4)
>> [1] 8
>>> truncate(f)
>> NULL
>>> seek(f, 0, "end")
>> [1] 4
>>> seek(f, NA)
>> [1] 4
>>> close(f)
>>> # can truncate a long file from 2000000008 bytes to 2000000004 bytes
>>> # first create a file with 2000000008 bytes (slightly < 2^31)
>>> f <- file("tmp1.txt", "wb")
>>> seek(f, 2000000000)
>> [1] 0
>>> writeLines(c("abc", "def"), f)
>>> close(f)
>>> f <- file("tmp1.txt", "r+b")
>>> seek(f, 0, "end")
>> [1] 0
>>> seek(f, NA)
>> [1] 2000000008
>>> seek(f, 2000000004)
>> [1] 2000000008
>>> truncate(f)
>> NULL
>>> seek(f, 0, "end")
>> [1] 2000000004
>>> seek(f, NA)
>> [1] 2000000004
>>> close(f)
>>> # cannot truncate a long file from 2200000008 bytes to 2200000004 bytes
>>> # first create a file with 2200000008 bytes (slightly > 2^31)
>>> f <- file("tmp1.txt", "wb")
>>> seek(f, 2200000000)
>> [1] 0
>>> writeLines(c("abc", "def"), f)
>>> close(f)
>>> f <- file("tmp1.txt", "r+b")
>>> seek(f, 0, "end")
>> [1] 0
>>> seek(f, NA) # bad reported value of the current position of "2200000008"
>> [1] -2094967288
>>> 2200000008 - 2^32
>> [1] -2094967288
>>> seek(f, 2200000004)
>> [1] -2094967288
>>> truncate(f) # doesn't work!
>> NULL
>>> seek(f, 0, "end")
>> [1] -2094967288
>>> # see if we successfully truncated... (no -- same length as before
>>> # can also verify this by watching file size with 'ls -l')
>>> seek(f, NA) # file is same size as before the attempted truncation
>> [1] -2094967288
>>> close(f)
>>> version
>> _
>> platform i386-pc-mingw32
>> arch i386
>> os mingw32
>> system i386, mingw32
>> status
>> major 2
>> minor 1.0
>> year 2005
>> month 04
>> day 18
>> language R
>>>
>>
>> ______________________________________________
>> R-devel at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list