[R] Matching 2 SQL tables

Dieter Menne dieter.menne at menne-biomed.de
Wed Dec 22 09:57:52 CET 2010

mathijsdevaan wrote:
> I have a postgresql and a mysql database and I would like to combine the
> info from two different tables in R. Both databases contain a table with
> three columns: project_name, release_id and release_date. So each project
> output could be released multiple times (I am interested in the first
> release_date). However, some of the data is missing. 
> Basically, what I want to do is to try and fill the missing data in 1
> table with the data from the other table. The difficulty here is that
> table1$project_name IS NOT table2$project_name. Example: green-tree and
> green tree, new(Jacket) and newJacket.

If there is a general matching rule, for example "Remove all special
characters", you could read in both tables with separate RODC queries, add a
new column "projectcore" to both tables that is generated by 

a$projectcore = gsub("[\(\)-]","",a$project_name) 

# not tested, this might require some of Dalgaard's "if you think you have
escaped enough, double it".

 and use a join (called merge() in R, or with package sqldf).

If there is no general matching rule, I would create special translation
table with two columns, e.g

project_name projectcore
green-treee     greentree
new(Jacket)     newJacket

and retrieve projectcore instead of project_name in the query. I this case,
you could also use an SQL join directly on both tables.


View this message in context: http://r.789695.n4.nabble.com/Matching-2-SQL-tables-tp3159678p3160306.html
Sent from the R help mailing list archive at Nabble.com.

More information about the R-help mailing list