[Rd] file.info() on file larger than 2GB
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Sep 15 17:57:39 CEST 2004
This appears to be fairly easy to solve, at least on Linux. R-devel now
has an option --enable-linux-lfs that sets up the appropriate flags (and a
very few other code changes were needed). Similar options work on
Solaris.
I have been able to create a 2.5Gb text file, move around it and read
lines from here and there including near the end, both as a plain file and
as a gzip-ed file. And file.info reports correctly.
On Tue, 31 Aug 2004, Prof Brian Ripley wrote:
> This is a purely OS issue: your OS has not been set up so the fopen and
> stat calls handle > 2Gb files. There is one R issue: the size will
> overflow in file.info.
>
> For example, under Solaris 64-bit applications can handle such files
> whereas 32-bit ones need calls to stat64, fopen64 etc.
>
> It seems a very exotic need, but if someone wants to find out how to use
> the OS-specific ways to extend stat etc and supply patches, please do so.
>
> We don't put OS-specific limitations on help pages.
>
> On Tue, 31 Aug 2004, Roger D. Peng wrote:
>
> > I've got a file that's approximately 2.2GB and it seems to be foiling
> > file.info(). When I run `stat' from the shell I get
> >
> > zooey:> stat data.csv
> > File: `data.csv'
> > Size: 2271197563 Blocks: 4440280 IO Block: 4096 regular file
> > Device: 342h/834d Inode: 9994308 Links: 1
> > Access: (0644/-rw-r--r--) Uid: ( 500/ rpeng) Gid: ( 500/ rpeng)
> > Access: 2004-08-31 09:50:04.000000000 -0400
> > Modify: 2004-08-26 19:09:42.000000000 -0400
> > Change: 2004-08-31 09:53:29.000000000 -0400
>
> Take a look at the source code for stat, in coreutils.
>
> > But, file.info() in R-devel gives me:
> >
> > > file.info("data.csv")
> > size isdir mode mtime ctime atime uid gid uname grname
> > data.csv NA NA <NA> <NA> <NA> <NA> NA NA <NA> <NA>
> >
> > I assume this has something to do with the underlying call to `stat'
> > in `do_fileinfo'.
> >
> > This alone is not much of a problem but I also can't seem to be able
> > to open a file connection to the same file. For example,
> >
> > > con <- file("data.csv")
> > > open(con, "r")
> > Error in open.connection(con, "r") : unable to open connection
> > In addition: Warning message:
> > cannot open file `data.csv'
> >
> > Also, interestingly,
> >
> > > file.exists("data.csv")
> > [1] FALSE
> >
> > I take it all these things are related.
> >
> > Is it possible to fix this within R? Or should there be a note in the
> > help pages?
> >
> > > version
> > _
> > platform i686-pc-linux-gnu
> > arch i686
> > os linux-gnu
> > system i686, linux-gnu
> > status Under development (unstable)
> > major 2
> > minor 0.0
> > year 2004
> > month 08
> > day 31
> > language R
> >
> > -roger
> >
> > ______________________________________________
> > R-devel at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list