[R] reading and parsing gzipped files
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Sep 5 09:25:55 CEST 2008
On Thu, 4 Sep 2008, Dmitriy Skvortsov wrote:
> Hi all,I have large compressed text tab delimited files,
> I am trying to write efficient function to read them,
> I am using gzfile() and readLines()
> zz <- gzfile("exampl.txt.gz", "r") # compressed file
> system.time(temp1<-readLines(zz ))
> which work fast, and create vector of strings.
> The problem is to parse the result, if I use strsplit it takes longer then
> decompress file manually , read it with scan and erase it.
> Can anybody recommend an efficient way of parsing large vector ~200,000
'parse'? What is wrong with using read.delim (reading 'tab delimited
files' is its job)? It (and scan) work with gzfile connections, so there
is no need to decompress manually.
See the 'R Data Import/Export Manual' for how to use read.delim efficiently.
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help