[R] How to more efficently read in a big matrix

jim holtman jholtman at gmail.com
Sat Nov 10 06:18:18 CET 2007


Here is an example of reading in file of 3M numbers (11MB of text
file) on my laptop:

> system.time(x <- scan('/tempyy', what=0))
Read 3000000 items
   user  system elapsed
   6.22    0.16    6.53
> str(x)
 num [1:3000000] 1 2 3 4 5 6 7 8 9 10 ...
> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  169954  4.6     350000  9.4   350000  9.4
Vcells 3102277 23.7    7803840 59.6  7200206 55.0
> object.size(x)
[1] 24000024

This took about 7 seconds.  You have about 40X more data, so it should
be interesting to see how it scales up.  The object size if 24MB, so
40X more is about 1GB.

On Nov 9, 2007 11:52 PM, affy snp <affysnp at gmail.com> wrote:
> Hi Jim,
>
> Thanks a lot! I am currently running it on my laptop but without any
> success. I could upload it to a server which is with 8Gb memory
> and it might be better to go from there.
>
> Actually, I could have the whole file splitted in two parts,
> one with 2nd column to 95th column, the other one with
> the rest of columns. However, I need all rows for the
> two parts.
>
> The file is in txt format and around 480Mb, very large though.
> Yes, it is of numeric values.
>
> I appreciate!
>
> Allen
>
>
>
>
>
>
> On Nov 9, 2007 11:46 PM, jim holtman <jholtman at gmail.com> wrote:
> > If they are all numeric, you can use 'scan' to read them in.  With
> > that amount of data, you will need almost 1GB to contain the single
> > object.  If you want to do any processing, you will probably need a
> > machine with at least 3-4GB of physical memory, preferrably a 64-bit
> > version of R.  What type of computer are you using?  Do you really
> > need all the data in at once, or can you process it in smaller batches
> > (e.g., 20,000 rows at a time)?  So a little more detail on what you
> > actually want to do with the data would be useful, since it does
> > create a very large object.  BTW how large is the file you are reading
> > and what is its format?  Have you considered a database with this
> > amount of data?
> >
> >
> > On Nov 9, 2007 11:39 PM, affy snp <affysnp at gmail.com> wrote:
> > > Dear list,
> > >
> > > I need to read in a big table with 487 columns and 238,305 rows (row names
> > > and column names are supplied). Is there a code to read in the table in
> > > a fast way? I tried the read.table() but it seems that it takes forever :(
> > >
> > > Thanks a lot!
> > >
> > > Best,
> > >    Allen
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> >
> > --
> > Jim Holtman
> > Cincinnati, OH
> > +1 513 646 9390
> >
> > What is the problem you are trying to solve?
> >
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list