Yep Sean is correct, scan is the way, here is a code snip that works out
the dimension of the matrix you are reading in by reading the first few
lines and then rearranges. You may need to change the "sep" arguments in
the read lines.



file<-"input"
options(show.error.messages = TRUE)
chromo<-try(read.delim(paste(file,".TXT",sep=""),header=T,nrows=1,sep="\t",fill=TRUE))  ### reads file input.TXT
num.vars<-dim(chromo)[2]
vars.names<-colnames(chromo)[1:dim(chromo)[2]]
##########################   
header.lines<-1
num.lines<-1
################################### 
chromo<-try(scan(paste(file,".TXT",sep=""),what=character(num.vars),skip=header.lines,sep="\t",fill=TRUE))
num.lines<-length(chromo)/(num.vars)
dim(chromo)<-c(num.vars,num.lines)
chromo<-t(chromo)
colnames(chromo)<-vars.names

A few million lines talks < 20secs, typically




-----Original Message-----
From: Gaston Fiore <gaston.fiore@gmail.com>
To: bioconductor@stat.math.ethz.ch
Subject: [BioC] Fastest way to read CSV files
Date: Thu, 19 Aug 2010 17:29:53 -0400


Hello everyone,

Is there a faster method to read CSV files than the read.csv function? I've CSV files containing a rectangular array with about 17 rows and 1.5 million columns with integer entries, and read.csv is being too slow for my needs.

Thanks for your help,

-Gaston

_______________________________________________
Bioconductor mailing list
Bioconductor@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

	[[alternative HTML version deleted]]

