[R] Reading large files

jim holtman jholtman at gmail.com
Fri Feb 5 21:07:01 CET 2010


What you need to do is to take a smaller sample of you data (e.g.
50-100MB) and load that data and determine how big the resulting
object is. Depends a lot on how you are loading it.  Are you using
'scan' or 'read.table'; if 'read.table' have you define the class of
the columns?  I typically read in files of 40MB in about 15 seconds
(300K rows with 16 columns).  The resulting object is about 24MB.  I
would expect you to be able to read in 100MB in under a minute.  The
other part of the question is how much of the data do you really need
to read in and process at once.  I assume that it is not all of it.
You might structure your data to only require reading in the data that
you need to analyze.  Just because you have a file that large, may not
mean you need all the data.

I have 2GB on my Windows box and try to keep the maximum object I
process to under 400MB since I know copies will be made at different
stages.  There are packages that let you do some of the analysis on
data that is larger than can fit in memory.  I would also suggest you
use a database so that you do not have to continually read in the
data.

If you  pockets are deep, go for a 64-bit version with 64GB if you
want to process files that are 10-15GB.  Otherwise rethink the problem
you are trying to solve with respect to some of the
boundaries/constraints that are imposed by most system.

On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani
<SATISH.VADLAMANI at fritolay.com> wrote:
>
> Matthew:
> If it is going to help, here is the explanation. I have an end state in
> mind. It is given below under "End State" header. In order to get there, I
> need to start somewhere right? I started with a 850 MB file and could not
> load in what I think is reasonable time (I waited for an hour).
>
> There are references to 64 bit. How will that help? It is a 4GB RAM machine
> and there is no paging activity when loading the 850 MB file.
>
> I have seen other threads on the same types of questions. I did not see any
> clear cut answers or errors that I could have been making in the process. If
> I am missing something, please let me know. Thanks.
> Satish
>
>
> End State
>> Satish wrote: "at one time I will need to load say 15GB into R"
>
>
> -----
> Satish Vadlamani
> --
> View this message in context: http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list