[R] Avoiding multiple outputs using RODBC package

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Dec 15 11:30:48 CET 2008


On Sun, 14 Dec 2008, Mark Wardle wrote:

> Hi.
>
> You don't need to download the whole of the output database table to
> look for an already generated answer. You can write a SQL query to do
> that instead. ie. give me any rows with these parameters...  Get the
> database to do the work - it is what they are designed to do.
>
> So the procedure is:
>
> 1. Get input parameters
> 2. Query the output database to see whether analysis has already been
> done (select * from output_table where...)
> 3. If not already done, do the calculation and insert result into output table
>
> Note: you don't have to use sqlSave to save data. One can add single
> rows by running arbitrary SQL.

Or sqlUpdate.

>
> Mark
>
> 2008/12/12 Brigid Mooney <bkmooney at gmail.com>:
>> I am using R as a data manipulation tool for a SQL database.  So in some of
>> my R scripts I use the RODBC package to retreive data, then run analysis,
>> and use the sqlSave function in the RODBC package to store the results in a
>> database.
>>
>> There are two problems I want to avoid, and they are highly related: (1)
>> having R rerun analysis which has already been done and saved into output
>> database table, and (2) ending up with more than one identical row in
>> my output database table.
>>
>> -------------------------------------
>> The analysis I am running allows the user to input a large number of
>> variables, for example:
>> date, version, a, b, c, d, e, f, g, ...
>>
>> After R completes its analysis, I write the results to a database table in
>> the format:
>> Value, date, version, a, b, c, d, e, f, g, ...
>>
>> where Value is the result of the R analysis, and the rest of the columns are
>> the criteria that was used to get that value.
>> --------------------------------------
>>
>> Can anyone think of a way to address these problems?  The only thing I can
>> think of so far is to run an sqlQuery to get a table of all the variable
>> combinations that are saved at the start, and then simply avoid computing
>> and re-outputing those results.  However, my results database table
>> currently has over 200K rows (and will grow very quickly as I keep going
>> with this project), so I think that would not be the most expeditious answer
>> as I think just the SQL query to download 200K rows x 10+ columns is going
>> to be time consuming in and of itself.
>>
>> I know this is kindof a weird problem, and am open to all sorts of ideas...
>>
>> Thanks!
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ______________________________________________________________________
>> This email has been scanned by the MessageLabs Email Security System.
>> For more information please visit http://www.messagelabs.com/email
>> ______________________________________________________________________
>>
>
>
>
> -- 
> Dr. Mark Wardle
> Specialist registrar, Neurology
> Cardiff, UK
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list