[R-sig-DB] Bulk editing of mySQL tables

Daniel Brewer d@n|e|@brewer @end|ng |rom |cr@@c@uk
Wed Mar 24 17:41:53 CET 2010



On 24/03/2010 3:29 PM, Sean Davis wrote:
> On Wed, Mar 24, 2010 at 11:11 AM, Daniel Brewer <daniel.brewer using icr.ac.uk> wrote:
>> On 24/03/2010 2:52 PM, Sean Davis wrote:
>>> On Wed, Mar 24, 2010 at 10:31 AM, Daniel Brewer <daniel.brewer using icr.ac.uk> wrote:
>>>> It was suggested that this would be a better place to post this than the
>>>> main R support list
>>>>
>>>> -----
>>>>
>>>> Hello,
>>>>
>>>> I have started to use RMySQL and I would like to use R to make batch
>>>> changes to data.  What it the best way to do this?  Is it to download
>>>> the table using dbGetQuery, manipulate the data in R and then
>>>> dbWriteTable to delete the existing table and replace it with the local
>>>> data.
>>>>
>>>> What I am concerned about this is that it might lose some mySQL table
>>>> configuration options and it isn't a very effective way to do it.
>>>>
>>>> An example of the sort of thing I am doing is to try and update a column
>>>> based on a CSV file stored locally.
>>>
>>> Hi, Dan.  Perhaps you could be a bit more specific?  I think the usual
>>> way to do this is with a SQL update.  Is there a reason not to do
>>> things this way?
>>>
>>> Sean
>>
>> Hi Sean,
> 
> I'm no expert here, so these are just my opinions....
> 
>> Well I was thinking of a number of different use cases:
>> 1) I have got some additional information for example gender for a
>> particular subset of patients that I would like to merge in with a
>> complex ID link between the two.
> 
> For this case, a temp table certainly makes sense, but this could be
> done with multiple update statements as well, I think.
> 
>> 2) Performing batch operations on the data for example taking a subset
>> that meets a  certain criteria and say, capitalising a particular field.
> 
> For this, SQL vendor extensions can often be useful.  In other
> databases (not MySQL), you can create stored procedures written in R,
> python, perl, etc. to do your complex manipulations within the
> database.
> 
>> For the process I am doing at the moment I am downloading the table,
>> manipulating the fields as required, then uploading those fields to a
>> temporary table and then doing an update i.e.
> 
> Keep in mind that this breaks ACID compliance unless you do something
> more to lock the database against changes or if you aren't every
> worried about the data changing underneath you.
> 
>> dbWriteTable(conn, "clintmp", unique(yoda[yoda$HospitalNum !=
>> 999999,c(4:5,8)]))
>> dbSendQuery(conn, "UPDATE ClinicalData cd, clintmp t SET
>> cd.Number=t.HospitalNum WHERE cd.ID=t.ID AND cd.Anonymised_ID =
>> t.Anonymised_ID")
>>
>> Maybe R is the tool for the job but I much prefer its data manipulation
>> to raw mySQL, so that is why I am exploring the area.
> 
> Yep.  R is quite nice for data munging.
> 
> Sean

Hi Sean,

Well I was thinking of a number of different use cases:
1) I have got some additional information for example gender for a
particular subset of patients that I would like to merge in with a
complex ID link between the two.
2) Performing batch operations on the data for example taking a subset
that meets a  certain criteria and say, capitalising a particular field.

For the process I am doing at the moment I am downloading the table,
manipulating the fields as required, then uploading those fields to a
temporary table and then doing an update i.e.
dbWriteTable(conn, "clintmp", unique(yoda[yoda$HospitalNum !=
999999,c(4:5,8)]))
dbSendQuery(conn, "UPDATE ClinicalData cd, clintmp t SET
cd.Number=t.HospitalNum WHERE cd.ID=t.ID AND cd.Anonymised_ID =
t.Anonymised_ID")

Maybe R is the tool for the job but I much prefer its data manipulation
to raw mySQL, so that is why I am exploring the area.

Thanks

Dan

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the a...{{dropped:2}}




More information about the R-sig-DB mailing list