[Rd] problems with truncate() with files > 2Gb under Windows (possibly (PR#7879)

Tony Plate tplate at blackmesacapital.com
Mon May 23 18:50:59 CEST 2005


I have some code that seems to work (as in has the desired effect on 
files) to truncate files >2Gb under Windows.

Here's my modified version of the file_truncate function:

static void file_truncate(Rconnection con)
{
     Rfileconn this = con->private;
     FILE *fp = this->fp;
     int fd = fileno(fp);
#ifdef Win32
     off64_t size = f_tell(fp);
     HANDLE fh; /* a Windows-specific resource pointer */
#elif defined(HAVE_OFF_T) && defined(__USE_LARGEFILE)
     off_t size = f_tell(fp);
#else
     long size = f_tell(fp);
#endif

     if(!con->isopen || !con->canwrite)
	error(_("can only truncate connections open for writing"));

     if(!this->last_was_write) this->rpos = f_tell(this->fp);
#if defined(Win32)
     /* ftruncate() (supplied by MinGW?) does not work for files>2Gb */
     /* (returns without error, but has no effect on file size), so */
     /* use native Windows calls. */
     /* Flush for safety, especially as we are mixing calls to */
     /* different levels of file I/O. */
     fflush(fp);
     fh = _get_osfhandle(fd);
     if (SetEndOfFile(fh) == 0) {
	char buf[100];
	sprintf(buf, "SetEndOfFile failed: %d\n", GetLastError() );
	error(_(buf));
     }
#elif HAVE_FTRUNCATE
     if(ftruncate(fd, size))
	error(_("file truncation failed"));
#else
     error(_("file truncation unavailable on this platform"));
#endif
     this->last_was_write = TRUE;
     this->wpos = f_tell(this->fp);
}


This also requires putting
# include <windows.h>
within the appropriate "#ifdef Win32" at the start of the file.

I hadn't realized until now that HAVE_FTRUNCATE is true in the MinGW 
environment, but as far as I can see there are no other calls to 
ftruncate() in the R base code, so still this place should be only one 
needing a change re ftruncate().  I don't know much about MinGW, but it 
appears (from reading the MinGW include/windows.h) that ftruncate() in 
MinGW just calls chsize() with the 64bit int length arg of ftruncate() 
coerced to the long int arg that chsize() accepts, so one could argue 
that MinGW is what is broken here.  Nonetheless, I hope you will 
consider the above change to make R work regardless of whether or not 
MinGW has bugs.

This is getting to be more than just a few lines of change, let me know 
if sending a patch is appropriate.

Here's an example of truncating a file > 2Gb.  Note that the "r" seek 
after a write after truncation does not appear to position the file 
pointer properly, but the same thing happens with small files in R 
without my modifications, so I suspect this is an independent issue 
(maybe not even involving truncation).

 > options(digits=15)
 > f <- file("tmp1.txt", "wb")
 > seek(f, 2200000000, rw="w")
[1] 0
 > writeLines(c("abc", "def"), f)
 > close(f)
 > options(digits=15)
 > f <- file("tmp1.txt", "r+b")
 > seek(f, 0, "end", rw="w")
[1] 0
 > seek(f, NA, rw="w")
[1] 2200000008
 > seek(f, 2200000004, rw="w")
[1] 2200000008
 > truncate(f)
NULL
 > # Try to read at the end of the file
 > seek(f, 0, "end", rw="r")
[1] 0
 > readLines(f, -1)
character(0)
 > seek(f, 0, "end", rw="r")
[1] 2200000004
 > # see if we successfully truncated...
 > # can also verify this by watching file size with 'ls -l')
 > seek(f, NA, rw="r")
[1] 2200000004
 > # write something at the end of the file
 > seek(f, 0, "end", rw="w")
[1] 2200000004
 > writeLines(c("def"), f)
 > # Try to read at the end of the file -- the result is *WRONG* here
 > # (If we do the seek() twice, we get the correct result.  We also
 > #  get the correct result if we call flush(f) before the seek. The
 > #  same thing also happens with small files).
 > # flush(f) # makes the seek work correctly
 > seek(f, 0, "end", rw="r")
[1] 2200000004
 > readLines(f, -1) # should return character(0)
[1] "def"
 > seek(f, 2200000000, rw="r")
[1] 2200000008
 > readLines(f, -1)
[1] "abc" "def"
 > # Write something earlier in the file
 > seek(f, 1000000000, rw="w")
[1] 2200000008
 > writeLines(c("ghi", "jkl"), f)
 > seek(f, 1000000000, rw="r")
[1] 2200000008
 > readLines(f, 2)
[1] "ghi" "jkl"
 > seek(f, 2200000000, rw="r")
[1] 1000000008
 > readLines(f, -1)
[1] "abc" "def"
 > close(f)
 >

Let me know if there is anything else I can do to make it easy for you 
to incorporate these changes (or what else you'd like to see to believe 
that these changes have a good chance of working properly overall.)

-- Tony Plate

Prof Brian Ripley wrote:
> On Fri, 20 May 2005, Prof Brian Ripley wrote:
> 
>> To follow up on the truncate() part of this, Windows does not use 
>> chsize directly any more, but ftruncate like all other platforms.  
>> However, truncate() was limited to files < 2Gb on all platforms.  I 
>> have changed the latter and your example now works both on 32-bit 
>> Windows and on 64-bit Linux.
> 
> 
> I am sorry, I should clarify `works', as I had not cross-checked well 
> enough.  There are no errors generated by the C call but ftruncate 
> appears not to do what is required of it, despite having off_t as an 
> argument. That is, at R level everything appears fine but the file was 
> not truncated when looked at outside R.
> 
> Not much we can do about broken OS calls.
>



More information about the R-devel mailing list