[BioC] Fastest way to read CSV files

Stijn van Dongen stijn at ebi.ac.uk
Fri Aug 20 01:31:24 CEST 2010


This piqued my interest, as for really large datasets it can in general speed
up things greatly to use binary formats (1.5 million does not sound *that* big
to me). I have no experience with this in R, but a little search brought up
e.g. readBin(). So it might be possible, especially if your data is quite
simple (all integers), to first convert your data externally to a binary
format (using perl or python or ..) and then read it with readBin().

Disclaimer: Quite likely a random thought from an ill-informed bystander.

best,
Stijn




On Thu, Aug 19, 2010 at 05:43:22PM -0400, Sean Davis wrote:
> Try using scan and then rearrange the resulting vector.
> 
> Sean
> 
> On Aug 19, 2010 5:32 PM, "Gaston Fiore" <gaston.fiore at gmail.com> wrote:
> 
> Hello everyone,
> 
> Is there a faster method to read CSV files than the read.csv function? I've
> CSV files containing a rectangular array with about 17 rows and 1.5 million
> columns with integer entries, and read.csv is being too slow for my needs.
> 
> Thanks for your help,
> 
> -Gaston
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Stijn van Dongen         >8<        -o)   O<  forename pronunciation: [Stan]
EMBL-EBI                            /\\   Tel: +44-(0)1223-492675
Hinxton, Cambridge, CB10 1SD, UK   _\_/   http://micans.org/stijn



More information about the Bioconductor mailing list