[R] label storage and conversions: DBMS and R
Frank E Harrell Jr
fharrell at virginia.edu
Sun Feb 9 23:16:07 CET 2003
On Sun, 9 Feb 2003 16:29:51 EST
TyagiAnupam at aol.com wrote:
> Hi R users,
> I am new to using DBMS with R for large datasets. Thanks to all who responded
> with useful suggestion to my earlier postings about using large datasets and
> DBMS with R. I am writing to get some help about how to design good tables in
> DBMS to take full advantage of the wonderful built-in facilities in R, like
> I am using RMySQL client. Because R makes good use of variable and value
> labels and data (column) types, I would like to create tables with
> appropriate design in terms of,
> (1) datatype (char, varchar, int, etc.) in DBMS such that it corresponds
> with the appropriate datatype in R (factor, numeric, etc.) when converted,
> (2) How best to store variable and values lables and formats in DBMS, so they
> are correctly included in the data.frame that DBMS clients like RMySQL create
> for use in R.
> If I had only a few variables and values this will not be a problem; I can
> use meaningful variable names or create labels directly in R. But with 1600
> variables, many with about 10 catagorical values, this approach does not look
> promising. Is there a document somewhere that addresses this issue? What
> would be a good way to solve this problem?
> Prediction is very difficult, especially about the future.
> -- Niels Bohr
> R-help at stat.math.ethz.ch mailing list
We are working on a PostgrSQL-based system in which all metadata are defined in XML. Ultimately I will interpret the XML metadata in R to fetch the variable labels. In the Hmisc library I have a label function to make it easy to assign a 'label' attribute to an individual variable, and a function upData which makes it easy to assign lots of labels. I will use the same 'label' attribute these use when fetching labels from XML.
Another possibility is to make a table defining variable-specific metadata. Then you could just read in the table and write a short function to pull out labels after matching on variable names, assigning the labels to an attribute of your choosing.
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
More information about the R-help