[R-sig-DB] Bulk editing of mySQL tables

Sean Davis @d@v|@2 @end|ng |rom m@||@n|h@gov
Wed Mar 24 16:29:04 CET 2010


On Wed, Mar 24, 2010 at 11:11 AM, Daniel Brewer <daniel.brewer using icr.ac.uk> wrote:
> On 24/03/2010 2:52 PM, Sean Davis wrote:
>> On Wed, Mar 24, 2010 at 10:31 AM, Daniel Brewer <daniel.brewer using icr.ac.uk> wrote:
>>> It was suggested that this would be a better place to post this than the
>>> main R support list
>>>
>>> -----
>>>
>>> Hello,
>>>
>>> I have started to use RMySQL and I would like to use R to make batch
>>> changes to data.  What it the best way to do this?  Is it to download
>>> the table using dbGetQuery, manipulate the data in R and then
>>> dbWriteTable to delete the existing table and replace it with the local
>>> data.
>>>
>>> What I am concerned about this is that it might lose some mySQL table
>>> configuration options and it isn't a very effective way to do it.
>>>
>>> An example of the sort of thing I am doing is to try and update a column
>>> based on a CSV file stored locally.
>>
>> Hi, Dan.  Perhaps you could be a bit more specific?  I think the usual
>> way to do this is with a SQL update.  Is there a reason not to do
>> things this way?
>>
>> Sean
>
> Hi Sean,

I'm no expert here, so these are just my opinions....

> Well I was thinking of a number of different use cases:
> 1) I have got some additional information for example gender for a
> particular subset of patients that I would like to merge in with a
> complex ID link between the two.

For this case, a temp table certainly makes sense, but this could be
done with multiple update statements as well, I think.

> 2) Performing batch operations on the data for example taking a subset
> that meets a  certain criteria and say, capitalising a particular field.

For this, SQL vendor extensions can often be useful.  In other
databases (not MySQL), you can create stored procedures written in R,
python, perl, etc. to do your complex manipulations within the
database.

> For the process I am doing at the moment I am downloading the table,
> manipulating the fields as required, then uploading those fields to a
> temporary table and then doing an update i.e.

Keep in mind that this breaks ACID compliance unless you do something
more to lock the database against changes or if you aren't every
worried about the data changing underneath you.

> dbWriteTable(conn, "clintmp", unique(yoda[yoda$HospitalNum !=
> 999999,c(4:5,8)]))
> dbSendQuery(conn, "UPDATE ClinicalData cd, clintmp t SET
> cd.Number=t.HospitalNum WHERE cd.ID=t.ID AND cd.Anonymised_ID =
> t.Anonymised_ID")
>
> Maybe R is the tool for the job but I much prefer its data manipulation
> to raw mySQL, so that is why I am exploring the area.

Yep.  R is quite nice for data munging.

Sean




More information about the R-sig-DB mailing list