[Rd] readBin differences on Windows and Linux/mac
Henrik Bengtsson
hb at stat.berkeley.edu
Tue Jan 1 17:20:28 CET 2008
On 01/01/2008, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:
> Also make sure the problem is not due to downloading a gzip file in
> text mode, because to the best of my understanding that is platform
> dependent. That is, use download.file(..., mode="wb") instead of the
> default, which is mode="w". (This is such a common error that I would
> like to suggest mode="wb" to become the default.)
Ok, that solves the problem with your example file. On WinXP/R v2.6.1:
> library(R.utils)
> uri <- "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE1/GSE1_series_matrix.txt.gz"
> download.file(uri, "test.txt.gz") # mode="w"
trying URL 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE1/GSE1_series_ma
trix.txt.gz'
ftp data connection made, file length 918804 bytes
opened URL
downloaded 897 Kb
> file.info("test.txt.gz")$size
[1] 922243
> download.file(uri, "test2.txt.gz")
ftp data connection made, file length 918804 bytes
opened URL
downloaded 897 Kb
> file.info("test2.txt.gz")$size
[1] 918804
> gunzip("test.txt.gz")
Error in readBin(inn, what = raw(0), size = 1, n = BFR.SIZE) :
negative length vectors are not allowed
> gunzip("test2.txt.gz")
> file.info("test2.txt")$size
[1] 3338362
/H
>
> /Henrik
>
> On 01/01/2008, Uwe Ligges <ligges at statistik.uni-dortmund.de> wrote:
> > I see. It is either a bug or something related to the following
> > paragraph from ?seek:
> >
> > We have found so many errors in the Windows implementation of file
> > positioning that users are advised to use it only at their own
> > risk, and asked not to waste the R developers' time with bug
> > reports on Windows' deficiencies.
> >
> > I will investigate more closely when I am back in office end of this week.
> >
> > Best,
> > Uwe
> >
> >
> >
> >
> > Sean Davis wrote:
> > > Sorry, Uwe. Of course:
> > >
> > > Both in relatively recent R-devel (one mac, one windows):
> > >
> > > ### gunzip pulled from R.utils to be a simple function
> > > ### In R.utils, implemented as a method
> > > gunzip <- function(filename, destname=gsub("[.]gz$", "", filename),
> > > overwrite=FALSE, remove=TRUE, BFR.SIZE=1e7) {
> > > if (filename == destname)
> > > stop(sprintf("Argument 'filename' and 'destname' are identical: %s",
> > > filename));
> > > if (!overwrite && file.exists(destname))
> > > stop(sprintf("File already exists: %s", destname));
> > >
> > > inn <- gzfile(filename, "rb");
> > > on.exit(if (!is.null(inn)) close(inn));
> > >
> > > out <- file(destname, "wb");
> > > on.exit(close(out), add=TRUE);
> > >
> > > nbytes <- 0;
> > > repeat {
> > > bfr <- readBin(inn, what=raw(0), size=1, n=BFR.SIZE);
> > > n <- length(bfr);
> > > if (n == 0)
> > > break;
> > > nbytes <- nbytes + n;
> > > writeBin(bfr, con=out, size=1);
> > > };
> > >
> > > if (remove) {
> > > close(inn);
> > > inn <- NULL;
> > > file.remove(filename);
> > > }
> > >
> > > invisible(nbytes);
> > > }
> > > download.file('
> > > ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE1/GSE1_series_matrix.txt.gz','test.txt.gz'
> > > <ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE1/GSE1_series_matrix.txt.gz','test.txt.gz'>)
> > > gunzip('test.txt.gz')
> > >
> > > Under windows, this results in the error reported below. Under mac and
> > > linux, results in test.txt being created in the current working
> > > directory. The actual gunzip function is pretty bare bones, so I don't
> > > think it complicates matters much to use it in this example.
> > >
> > > Sean
> > >
> > >
> > > On Dec 31, 2007 1:24 PM, Uwe Ligges <ligges at statistik.uni-dortmund.de
> > > <mailto:ligges at statistik.uni-dortmund.de>> wrote:
> > >
> > > Can you give a reproducible example, pelase?
> > >
> > > Uwe Ligges
> > >
> > >
> > > Sean Davis wrote:
> > > > I have been trying to use the gunzip function in the R.utils
> > > package. It
> > > > opens a connection to a gzfile, uses readBin to read from that
> > > connection,
> > > > and then uses writeBin to write out the raw data to a new file.
> > > This works
> > > > as expected under linux/mac, but under Windows, I get:
> > > >
> > > > Error in readBin(inn, what= raw(0), size = 1, n=BFR.SIZE) :
> > > > negative length vectors are not allowed
> > > >
> > > > A simple traceback shows the error in readBin. I wouldn't be
> > > surprised if
> > > > this is a programming issue not located in readBin, but I am
> > > confused about
> > > > the difference in behaviors on Windows versus mac/linux. Any
> > > insight into
> > > > what I can do to remedy the issue and have a cross-platform gunzip()?
> > > >
> > > > Thanks,
> > > > Sean
> > > >
> > > > [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> > >
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
More information about the R-devel
mailing list