[R-sig-DB] RMySQL and factors

Don MacQueen m@cq @end|ng |rom ||n|@gov
Thu Feb 3 21:38:46 CET 2005


At 10:03 PM +0200 2/3/05, mikko.virtanen using ktl.fi wrote:
>Don MacQueen <macq using llnl.gov> wrote:
>>  How about:
>>
>>  getrace <- function(connection) {
>>     sql <- "SELECT race FROM PATIENT"
>>     rs <- dbGetQuery(connection, sql)
>>
>>     rs$trace <- factor(rs$trace)
>>
>>     class(rs$race)
>>  }
>
>This has the potential problem that if the data changes or the sql 
>statement has
>a WHERE in it, each run might get slightly different levels().


I don't think that presents a problem. Either you know from the 
experimental design what all of the possible levels are, or you don't.

If you don't, there is no way to make sure that all will be present 
every time the function is used., no matter whether a 'where' clause 
is added or not, or whether or not the table changes.

If you do know what the possible levels are, you can explicitly list 
them in the call to factor(). See ?factor.

If at any given point in time the database table is considered to 
have all the possible levels present, and you want to have all of 
them present in the factor variable in R, even when not all levels 
are retrieved from the database, it's still easy:

Add another query, a select distinct on the field that is to become a 
factor. Use the results to specify the levels in the call to 
factor(). Then you will always have the same factors, regardless of 
what subset is selected.

>I guess the MySQL does have the ENUM data type, and the RSDBI does have the
>hooks for using this information, but nothing seems to be implemented at the
>moment.
>
>    MJ;

-Don
-- 
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA




More information about the R-sig-DB mailing list