[R] integer and floating-point storage

Matt Shotwell Matt.Shotwell at Vanderbilt.Edu
Thu Apr 14 21:32:03 CEST 2011


Hi Mike,

There are some facilities for storing and manipulating small (2 bit) 
integers. See here:

http://cran.r-project.org/web/packages/ff/index.html

-Matt

On 04/14/2011 01:20 PM, Mike Miller wrote:
> I note that "current implementations of R use 32-bit integers for
> integer vectors," but I am working with large arrays that contain
> integers from 0 to 3, so they could be stored as unsigned 8-bit
> integers. Can R do this? (FYI -- This is for storing minor-allele counts
> for genetic studies. There are 0, 1 or 2 minor alleles and 3 would
> represent missing.)
>
> It is theoretically possible to store such data with four integers per
> byte. This is what PLINK (GPL license) does in its binary (.bed)
> pedigree format:
>
> http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped
>
> That might be too much to hope for. ;-)
>
> I think that the R system uses double-precision floating point numbers
> by default. When I impute minor-allele counts, I get posterior expected
> values ranging from 0 to 2 (called dosages). The imputation isn't very
> precise, so it would be fine to store such data using one or two bytes.
> (The values are used as regressors and small changes would have minimal
> impact on results.) I could use unsigned 8-bit integers (0 to 255),
> probably using only 0 to 254 so that 1 and 2 could be represented with
> perfect precision as 127/127 and 254/127 (but I would do regression on
> the integer values). Or I could use 16 bits, doubling memory load and
> improving precision. It would be convenient if R could work with
> half-precision floating-point numbers (binary16):
>
> http://en.wikipedia.org/wiki/Half_precision_floating-point_format
>
> Can R do that?
>
> If not, is anyone interested in working on developing some of these
> features in R? We have GPL code from PLINK and Octave that might help a
> lot.
>
> http://www.gnu.org/software/octave/doc/interpreter/Integer-Data-Types.html
>
> Best,
>
> Mike
>
> --
> Michael B. Miller, Ph.D.
> Bioinformatics Specialist
> Minnesota Center for Twin and Family Research
> Department of Psychology
> University of Minnesota
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Matthew S Shotwell   Assistant Professor           School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list