[R-sig-DB] RMySQL and factors
Don MacQueen
m@cq @end|ng |rom ||n|@gov
Thu Feb 3 21:38:46 CET 2005
At 10:03 PM +0200 2/3/05, mikko.virtanen using ktl.fi wrote:
>Don MacQueen <macq using llnl.gov> wrote:
>> How about:
>>
>> getrace <- function(connection) {
>> sql <- "SELECT race FROM PATIENT"
>> rs <- dbGetQuery(connection, sql)
>>
>> rs$trace <- factor(rs$trace)
>>
>> class(rs$race)
>> }
>
>This has the potential problem that if the data changes or the sql
>statement has
>a WHERE in it, each run might get slightly different levels().
I don't think that presents a problem. Either you know from the
experimental design what all of the possible levels are, or you don't.
If you don't, there is no way to make sure that all will be present
every time the function is used., no matter whether a 'where' clause
is added or not, or whether or not the table changes.
If you do know what the possible levels are, you can explicitly list
them in the call to factor(). See ?factor.
If at any given point in time the database table is considered to
have all the possible levels present, and you want to have all of
them present in the factor variable in R, even when not all levels
are retrieved from the database, it's still easy:
Add another query, a select distinct on the field that is to become a
factor. Use the results to specify the levels in the call to
factor(). Then you will always have the same factors, regardless of
what subset is selected.
>I guess the MySQL does have the ENUM data type, and the RSDBI does have the
>hooks for using this information, but nothing seems to be implemented at the
>moment.
>
> MJ;
-Don
--
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
More information about the R-sig-DB
mailing list