[R] Getting frustrated with RMySQL

Gabor Grothendieck ggrothendieck at gmail.com
Wed Oct 15 00:29:36 CEST 2008


Its been pointed out to me that there is a bug in gsubfn that
occurs when a variable named x is used so the example below
will not work correctly.  It seems it clashes with an internal
variable also called x.  In the meantime use a different variable
name such as X.

library(gsubfn)
X <- 3
fn$dbGetQuery(con, "select * from myTable where myColumnA = $X and
  MyColumnB = `2*X` ")

On Tue, Oct 14, 2008 at 5:52 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> The gsubfn package can do quasi perl-style interpolation by
> prefacing any function call with fn$.
>
> library(gsubfn)
> x <- 3
> fn$dbGetQuery(con, "select * from myTable where myColumnA = $x and
> MyColumnB = `2*x` ")
>
> See http://gsubfn.googlecode.com
>
>
> On Tue, Oct 14, 2008 at 5:32 PM, Jeffrey Horner
> <jeff.horner at vanderbilt.edu> wrote:
>> Ted Byers wrote on 10/14/2008 02:33 PM:
>>>
>>> Getting the basic stuff to work is trivially simple.  I can connect, and,
>>> for
>>> example, get everything in any given table.
>>>
>>> What I have yet to find is how to deal with parameterized queries or how
>>> to
>>> do a simple insert (but not of a value known at the time the script is
>>> written - I ultimately want to put my script into a scheduled task, so the
>>> analysis can be repeated on updated data either daily or weekly).
>>> Using "INSERT INTO myTable (a) VALUES (1)" is simple enough, but what if I
>>> want to insert a sample number (using, e.g. WEEK(sample_date) as a sample
>>> identifier) along with the rate parameter estimated using fitdistr to fit
>>> an
>>> exponential distribution to a dataset, along with its sd?  If I were using
>>> Perl or Java, I'd set up the query similar to "INSERT INTO myTable (a,b,c)
>>> VALUES (?,?,?)", and then use function calls to set each of the query
>>> parameters.  I am having an aweful time finding the corresponding
>>> functions
>>> in RMySQL.
>>
>> I've found the best way to parameterize is using R's sprintf function. For
>> instance, the following query not only parameterizes the variable position,
>> but also the table name:
>>
>> fields  <- dbGetQuery(con,sprintf("select field,elem_label from %s_meta
>> where field='%s'",inp$pnid,inp$field))
>>
>> Best,
>>
>> Jeff
>>
>>>
>>> And for the data, the simplest, and most efficient, way to get the data is
>>> to use a statement like:
>>>
>>> SELECT a,b,c FROM myTable GROUP BY g_id, WEEK(sdate);
>>>
>>> The data is in MySQL, and my analysis needs to be applied independantly to
>>> each group obtained from a query like this.  It appears I can't use a data
>>> frame since none of the samples are of the same size (lets say the
>>> probability of the samples being the same size in indistinguishable from
>>> 0). Is it possible to put the resultset from such a query into a list of
>>> vectors
>>> that I can iterate over, passing each vector to fitdistr in turn?  If so,
>>> how?
>>>
>>> I know I can get this using Perl (by getting each sample individually and
>>> writing it to a file, then having R read the file, do the analysis and
>>> write
>>> the output to another file, and then have Perl parse the output file to
>>> insert the parameter estimates I need into the appropriate table), but
>>> that
>>> seems inefficient.
>>>
>>> Is it possible to do all I need with R working directly with MySQL?  If
>>> so,
>>> can someone fill in the apparent gaps left in the RMySQL documentation?
>>>
>>> Thanks.
>>>
>>> Ted
>>
>>
>> --
>> http://biostat.mc.vanderbilt.edu/JeffreyHorner
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list