[R] Manage huge database

José E. Lozano lozalojo at jcyl.es
Mon Sep 22 12:26:29 CEST 2008

> Maybe you've not lurked on R-help for long enough :) Apologies!


> So, how much "design" is in this data? If none, and what you've
> basically got is a 2000x500000 grid of numbers, then maybe a more raw

Exactly, raw data, but a little more complex since all the 500000 variables
are in text format, so the width is around 2,500,000.

> http://cran.r-project.org/web/packages/RNetCDF/index.html
> http://cran.r-project.org/web/packages/hdf5/index.html

Thanks, I will check. Right now I am reading line by line the file. It's
time consuming, but since I will do it only once, just to rearrange the data
into smaller tables to query, it's ok.

> Thinking back to your 4GB file with 1,000,000,000 entries, that's
> only 3 bytes per entry (+1 for the comma). What is this data? There
> may be more efficient ways to handle it.

Is genetic DNA data (individuals genotyped), hence the large amount of
columns to analyze.

Best Regards,
Jose Lozano
Jose E. Lozano Alonso
Observatorio de Salud Pública.
Direccion General de Salud Pública e I+D+I.
Junta de Castilla y León.
Direccion: Paseo de Zorrilla, nº1. Despacho 3103. CP 47071. Valladolid.

More information about the R-help mailing list