[R] gzfile() produces large files
Prof Brian D Ripley
ripley at stats.ox.ac.uk
Thu Jun 28 00:44:04 CEST 2001
Here are some more experiments:
zz <- gzfile("t1.gz", "w")
write(1:1000, zz)
close(zz)
zz <- gzfile("t2.gz", "w")
writeLines(as.character(1:1000), zz)
close(zz)
zz <- gzfile("t3.gz", "w")
writeBin(1:1000, zz)
close(zz)
zz <- textConnection("out", "w")
write(1:1000, zz)
close(zz)
zz <- gzfile("t4.gz", "w")
writeLines(out, zz)
close(zz)
ls -l
-rw-r--r-- 1 ripley Administ 15913 Jun 27 23:20 t1.gz
-rw-r--r-- 1 ripley Administ 1848 Jun 27 23:20 t2.gz
-rw-r--r-- 1 ripley Administ 1434 Jun 27 23:20 t3.gz
-rw-r--r-- 1 ripley Administ 1856 Jun 27 23:20 t4.gz
All are 3893 bytes uncompressed except t3, which is 4000. The problem with
the first is that it writes in very small pieces,
1 \n 2 \n 3 \n 4 \n ...
and as the output is trying for no latency, it has too little
opportunity to compress.
The moral seems to be to write to gzfile connections in moderately-sized
pieces. It's the one-byte carriage returns that really do the damage here.
On Wed, 27 Jun 2001, Prof Brian Ripley wrote:
> On Wed, 27 Jun 2001, Uwe Ligges wrote:
>
> > I observed some strange results playing around with gzfile() [R-1.3.0,
> > WinNT 4.0]:
> >
> > At first
> >
> > x <- 1:1000
> > write(x, file = "c:/temp.txt")
> >
> > results in a file of about 4 kB. But
> >
> > my.con <- gzfile("c:/temp.gz", open = "w")
> > write(x, file = my.con)
> > close(my.con)
> >
> > results in a file of about 16 kB.
> >
> > I expected a reduction of the size. Anyone who can tell me what went
> > wrong?
>
> My experiments concur: I do get a 15913 byte file and it is a valid gzip
> file.
>
> I've used this much more to read compressed files than write them.
> I will take a closer look at the zlib specs when I have time.
>
> Brian
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272860 (secr)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list