[R] Optimized File Reading with R
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Tue May 15 20:07:38 CEST 2007
Prof Brian Ripley wrote:
> On Tue, 15 May 2007, Lorenzo Isella wrote:
>
>
>> Dear All,
>> Hope I am not bumping into a FAQ, but so far my online search has been fruitless
>> I need to read some data file using R. I am using the (I think)
>> standard command:
>>
>> data_150<-read.table("y_complete06000", header=FALSE)
>>
>> where y_complete06000 is a 6000 by 40 table of numbers.
>> I am puzzled at the fact that R is taking several minutes to read this file.
>> First I thought it may have been due to its shape, but even
>> re-expressing and saving the matrix as a 1D array does not help.
>> It is not a small file, but not even huge (it amounts to about 5Mb of
>> text file).
>> Is there anything I can do to speed up the file reading?
>>
>
> You could try reading the help page or the 'R Data Import/Export' manual.
> Both point out things like
>
> 'read.table' is not the right tool for reading large matrices,
> especially those with many columns: it is designed to read _data
> frames_ which may have columns of very different classes. Use
> 'scan' instead.
>
> On the other hand I am surprised at several minutes, but as you haven't
> even told us your OS, it is hard to know what to expect. My Linux box
> took 3 secs for a 6000x40 matrix with read.table, 0.8 sec with scan.
>
>
If it is 40 rows and 6000 columns, then it might explain it:
> x <- as.data.frame(matrix(rnorm(40*6000),6000))
> write.table(x,file="xx.txt")
> system.time(y <- read.table("xx.txt"))
user system elapsed
1.229 0.007 1.250
> write.table(t(x),file="xx.txt")
> system.time(y <- read.table("xx.txt"))
user system elapsed
92.986 0.188 93.912
However, this is still not _several_ minutes, and it is on my laptop
which is not particularly fast.
More information about the R-help
mailing list