[R] Reading large files
jholtman at gmail.com
Fri Feb 5 21:07:01 CET 2010
What you need to do is to take a smaller sample of you data (e.g.
50-100MB) and load that data and determine how big the resulting
object is. Depends a lot on how you are loading it. Are you using
'scan' or 'read.table'; if 'read.table' have you define the class of
the columns? I typically read in files of 40MB in about 15 seconds
(300K rows with 16 columns). The resulting object is about 24MB. I
would expect you to be able to read in 100MB in under a minute. The
other part of the question is how much of the data do you really need
to read in and process at once. I assume that it is not all of it.
You might structure your data to only require reading in the data that
you need to analyze. Just because you have a file that large, may not
mean you need all the data.
I have 2GB on my Windows box and try to keep the maximum object I
process to under 400MB since I know copies will be made at different
stages. There are packages that let you do some of the analysis on
data that is larger than can fit in memory. I would also suggest you
use a database so that you do not have to continually read in the
If you pockets are deep, go for a 64-bit version with 64GB if you
want to process files that are 10-15GB. Otherwise rethink the problem
you are trying to solve with respect to some of the
boundaries/constraints that are imposed by most system.
On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani
<SATISH.VADLAMANI at fritolay.com> wrote:
> If it is going to help, here is the explanation. I have an end state in
> mind. It is given below under "End State" header. In order to get there, I
> need to start somewhere right? I started with a 850 MB file and could not
> load in what I think is reasonable time (I waited for an hour).
> There are references to 64 bit. How will that help? It is a 4GB RAM machine
> and there is no paging activity when loading the 850 MB file.
> I have seen other threads on the same types of questions. I did not see any
> clear cut answers or errors that I could have been making in the process. If
> I am missing something, please let me know. Thanks.
> End State
>> Satish wrote: "at one time I will need to load say 15GB into R"
> Satish Vadlamani
> View this message in context: http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
> Sent from the R help mailing list archive at Nabble.com.
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help