[R] Ordered factors and DBMS

David James dj at research.bell-labs.com
Thu Oct 4 22:07:39 CEST 2001


Frank E Harrell Jr wrote:
> As someone mentioned in a previous posting, the inherent ENUM and
> SET data types in MySQL give it an advantage in this area.

Unfortunately this is highly non-portable.  Moreover, the conversion
is not 1-1, for both factors and ordered factors get mapped into
MySQL's ENUM creating an ambiguity when we fetch a previously
exported R/S object.  XML, as you point out, is more amenable for
this type of higher-level specification.

> I am just starting to work on a project in which XML
> Metadata specifies a MySQL database (including factor
> levels for ENUM and SET (multiple choice instead of single)
> fields).  This metadata will later drive some "smart"
> import functions in R that will make such variables
> immediately analyzable and self-documenting.  Some 
> mocked-up examples are at the end of the slides from
> a talk I recently gave, available at
> http://hesweb1.med.virginia.edu/biostat/dm
> 
> I prefer to use the inherent data types rather than
> making factor labels reside in a separate SQL table.

No problem (I think) for self-contained applications whose 
data will not need to migrate from one RDBMS to another.
But I second Tim's suggestion that we should be thinking
about this problem in general terms, both beyond factors and 
RDBMS.

> 
> I have only just begun to look at RMySQL documentation.
> Does RMySQL fetch SET variables as binary integers 
> such that selected choices can be sensed once
> the bit mapping is known?

No, currently we (naively) fetched these as character vectors,
following the practice in Paul DuBois' "MySQL". I'm happy to 
hear any suggestion...

> 
> Frank Harrell
> 
> 
> Jason Turner wrote:
> > 
> > Additions to Doug Bates' suggestions
> > 
> > > ... it could be expressed naturally as two tables and a
> > > join.  The first table would contain the data as factor levels and the
> > > second table would give the correspondence between levels and labels.
> > 
> > I think this is probably the most clear and elegant way to do it.
> > 
> > > Another possibility is to define a PostgreSQL data type for this.
> > 
> > See the PosgreSQL documentation for "CREATE TYPE".
> > 
> > Cheers
> > 
> > Jason
> > --
> > Indigo Industrial Controls Ltd.
> > 64-21-343-545
> > jasont at indigoindustrial.co.nz
> > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > Send "info", "help", or "[un]subscribe"
> > (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> 
> -- 
> Frank E Harrell Jr              Prof. of Biostatistics & Statistics
> Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
> U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

-- 
David A. James
Statistics Research, Room 2C-253            Phone:  (908) 582-3082       
Bell Labs, Lucent Technologies              Fax:    (908) 582-3340
Murray Hill, NJ 09794-0636
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list