[R] Ordered factors and DBMS
David James
dj at research.bell-labs.com
Thu Oct 4 22:07:39 CEST 2001
Frank E Harrell Jr wrote:
> As someone mentioned in a previous posting, the inherent ENUM and
> SET data types in MySQL give it an advantage in this area.
Unfortunately this is highly non-portable. Moreover, the conversion
is not 1-1, for both factors and ordered factors get mapped into
MySQL's ENUM creating an ambiguity when we fetch a previously
exported R/S object. XML, as you point out, is more amenable for
this type of higher-level specification.
> I am just starting to work on a project in which XML
> Metadata specifies a MySQL database (including factor
> levels for ENUM and SET (multiple choice instead of single)
> fields). This metadata will later drive some "smart"
> import functions in R that will make such variables
> immediately analyzable and self-documenting. Some
> mocked-up examples are at the end of the slides from
> a talk I recently gave, available at
> http://hesweb1.med.virginia.edu/biostat/dm
>
> I prefer to use the inherent data types rather than
> making factor labels reside in a separate SQL table.
No problem (I think) for self-contained applications whose
data will not need to migrate from one RDBMS to another.
But I second Tim's suggestion that we should be thinking
about this problem in general terms, both beyond factors and
RDBMS.
>
> I have only just begun to look at RMySQL documentation.
> Does RMySQL fetch SET variables as binary integers
> such that selected choices can be sensed once
> the bit mapping is known?
No, currently we (naively) fetched these as character vectors,
following the practice in Paul DuBois' "MySQL". I'm happy to
hear any suggestion...
>
> Frank Harrell
>
>
> Jason Turner wrote:
> >
> > Additions to Doug Bates' suggestions
> >
> > > ... it could be expressed naturally as two tables and a
> > > join. The first table would contain the data as factor levels and the
> > > second table would give the correspondence between levels and labels.
> >
> > I think this is probably the most clear and elegant way to do it.
> >
> > > Another possibility is to define a PostgreSQL data type for this.
> >
> > See the PosgreSQL documentation for "CREATE TYPE".
> >
> > Cheers
> >
> > Jason
> > --
> > Indigo Industrial Controls Ltd.
> > 64-21-343-545
> > jasont at indigoindustrial.co.nz
> > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > Send "info", "help", or "[un]subscribe"
> > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
> > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>
> --
> Frank E Harrell Jr Prof. of Biostatistics & Statistics
> Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
> U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
--
David A. James
Statistics Research, Room 2C-253 Phone: (908) 582-3082
Bell Labs, Lucent Technologies Fax: (908) 582-3340
Murray Hill, NJ 09794-0636
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list