[R] How long does skipping in read.table take
Gabor Grothendieck
ggrothendieck at gmail.com
Sat Oct 23 16:19:58 CEST 2010
On Sat, Oct 23, 2010 at 10:07 AM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:
> I just tried it:
>
> for(i in 11:16){ #i<-11
> start<-Sys.time()
> print(start)
> flush.console()
> filename<-paste("skipped millions- ",i,".txt",sep="")
> mydata<-read.csv.sql("myfilel.txt", sep="|", eol="\r\n", sql =
> "select * from file limit 1000000, (1000000*i-1)")
The SQL statement does not know anything about R variables. You would
need something like this:
> i <- 1
> s <- sprintf("select from file limit 10, %d", 10*1-1)
> s
[1] "select from file limit 10, 9"
> read.csv.sql(..., sql = s, ...)
Also if you just want to read it in as chunks reading from a
connection in R would be sufficient:
k <- 5000 # no of rows per chunk
first <- TRUE
con <- file('myfile.csv', "r")
repeat {
# skip header
if (first) hdgs <- readLines(con, 1)
first <- FALSE
x <- readLines(con, k)
if (length(x) == 0) break
DF <- read.csv(textConnection(x), header = FALSE)
# process chunk -- we just print last row here
print(tail(DF, 1))
}
close(con)
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list