[R] How to import the large data into R

Jun Chen chenjuncau at gmail.com
Mon Oct 19 20:15:14 CEST 2009

yes, it's in windows, you are so serious, Thanks so much, give me very
good suggestion, i will try it. Thanks again.
Best regards
jun chen

On Mon, Oct 19, 2009 at 5:04 PM,  <tlumley at u.washington.edu> wrote:
> On Mon, 19 Oct 2009, Jun Chen wrote:
>> Dear,
>> I would like to deal with microarray data, it can run when i deal with
>> little data. However, the amount number of SNP data are 45181, amount
>> numbers of animal are 3081,it can not be allocated 1000Mb memory when
>> i importing them to R
>> Procedure sentence show:
>> m<-matrix(scan("D:/SNPdata.txt"),ncol=nmarkers,byrow=TRUE)
>> Error show:
>> Error: cannot allocate vector of size 1000.0 Mb
> It says you don't have enough memory.  When stored as floating point numbers
> the SNPs will take up 1Gb, which is quite a lot -- more than you can
> conveniently analyze in a 32-bit version of R[*] -- you probably have more
> than 1Gb of memory, but R does need to make copies of things.
> In  my experience with SNP data, there are two strategies: storing the data
> more efficiently (1 byte/SNP), as the Bioconductor package snpMatrix does,
> or reading in just part of the data at a time (what I have usually done).
>  My approach is to read the data in chunks and store it in a netCDF file
> with the ncdf package, and then at analysis time to read data as needed from
> netCDF.  This also works well for parallel processing -- many R sessions can
> read efficiently from the netCDF file.
> [*] you didn't provide the requested information about your system, but "D:"
> looks Windows.
>       -thomas
> Thomas Lumley                   Assoc. Professor, Biostatistics
> tlumley at u.washington.edu        University of Washington, Seattle

Jun Chen
Department of Animal Breeding and Genetics
Göttingen University
Albrecht-Thaer-Weg 3
37075 Göttingen

More information about the R-help mailing list