[Rd] working with huge memory: single precision?

Wed Aug 27 16:00:27 CEST 2014

On 27/08/2014 14:43, Simon Urbanek wrote:
> Mario,
>
> On Aug 27, 2014, at 4:03 AM, Mario Emmenlauer <mario at emmenlauer.de> wrote:
>
>>
>> Hello,
>>
>> I'm very new to R and don't know much about it yet. I would like
>> to develop R-programs that work with data of sizes of 10^10 - 10^11
>> data points. We have very-high-memory machines with ~256 GB, but it
>> would significantly help if I could store the data points in single
>> precision in RAM instead of double precision. Is that possible?
>>
>
> You can (e.g. in raw vectors), but it may not help much since you can't operate on them directly, since no functions in R know how to deal with single-precision floats - all arithmetics are on double precision vectors. If you want to load the data in memory but only work on small pieces, then it would work since you could extract the piece, convert to doubles and carry on.

We have almost no idea what you want to do with the data, but in my 
experience datasets of a billion cases are best divided into homogeneous 
groups for analysis followed by a meta-analysis.  I've yet to see an 
example where storing the data in an efficient RDBMS and loading 
sections into multiple R sessions did not make a better workflow.  They 
may exist: they are not the norm.

And BTW, 256GB is not really a lot of RAM, and storing as floats would 
only reduce the footprint 0.5x.

>
>> In the documentation I found a sentence saying its not supported,
>> at least not out of the box. But I am quite desperate and would also
>> consider working with an alpha version or with extension packages?
>>
>> Ideally I would like type promotion to work, i.e. that when using
>> the data in math operations they should be promoted to double.
>>
>
> That won't work automatically that way, but you cloud write methods for operators on your new type class and implement it as coercion + call to the regular operators. You may take a hint from the 64-bit int packages and I dimly recall that some of the mem-mapping packages (bigMemory, ff, ..) may also support single-precision storage.
>
> Cheers,
> Simon
>
>
>
>> Any help is greatly appreciated! All the best,
>>
>>     Mario
>>
>>
>>
>> --
>> Mario Emmenlauer BioDataAnalysis             Mobil: +49-(0)151-68108489
>> Balanstrasse 43                    mailto: mario.emmenlauer * unibas.ch
>> D-81669 München                          http://www.marioemmenlauer.de/
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK