[R] First time r user

Steve Lianoglou lianoglou.steve at gene.com
Sun Aug 18 10:11:40 CEST 2013


Hi Paul,

On Sun, Aug 18, 2013 at 12:56 AM, Paul Bernal <paulbernal07 at gmail.com> wrote:
> Thanks a lot for the valuable information.
>
> Now my question would necessarily be, how many columns can R handle,
> provided that I have millions of rows and, in general, whats the maximum
> amount of rows and columns that R can effortlessly handle?

This is all determined by your RAM.

Prior to R-3.0, R could only handle vectors of length 2^31 - 1. If you
were working with a matrix, that meant that you could only have that
many elements in the entire matrix.

If you were working with a data.frame, you could have data.frames with
2^31-1 rows, and I guess as many columns, since data.frames are really
a list of vectors, the entire thing doesn't have to be in one
contiguous block (and addressable that way)

R-3.0 introduced "Long Vectors" (search for that section in the release notes):

https://stat.ethz.ch/pipermail/r-announce/2013/000561.html

It almost doubles the size of a vector that R can handle (assuming you
are running 64bit). So, if you've got the RAM, you can have a
data.frame/data.table w/ billion(s) of rows, in theory.

To figure out how much data you can handle on your machine, you need
to know the size of real/integer/whatever and the number of elements
of those you will have so you can calculate the amount of RAM you need
to load it all up.

Lastly, I should mention there are packages that let you work with
"out of memory" data, like bigmemory, biglm, ff. Look at the HPC Task
view for more info along those lines:

http://cran.r-project.org/web/views/HighPerformanceComputing.html


>
> Best regards and again thank you for the help,
>
> Paul
> El 18/08/2013 02:35, "Steve Lianoglou" <lianoglou.steve at gene.com> escribió:
>
>> Hi Paul,
>>
>> First: please keep your replies on list (use reply-all when replying
>> to R-help lists) so that others can help but also the lists can be
>> used as a resource for others.
>>
>> Now:
>>
>> On Aug 18, 2013, at 12:20 AM, Paul Bernal <paulbernal07 at gmail.com> wrote:
>>
>> > Can R really handle millions of rows of data?
>>
>> Yup.
>>
>> > I thought it was not possible.
>>
>> Surprise :-)
>>
>> As I type, I'm working with a ~5.5 million row data.table pretty
>> effortlessly.
>>
>> Columns matter too, of course -- RAM is RAM, after all and you've got
>> to be able to fit the whole thing into it if you want to use
>> data.table. Once loaded, though, data.table enables one to do
>> split/apply/combine calculations over these data quite efficiently.
>> The first time I used it, I was honestly blown away.
>>
>> If you find yourself wanting to work with such data, you could do
>> worse than read through data.table's vignette and FAQ and give it a
>> spin.
>>
>> HTH,
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Computational Biologist
>> Bioinformatics and Computational Biology
>> Genentech
>>
>
>         [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech



More information about the R-help mailing list