[R] Matching 2 SQL tables
dieter.menne at menne-biomed.de
Wed Dec 22 09:57:52 CET 2010
> I have a postgresql and a mysql database and I would like to combine the
> info from two different tables in R. Both databases contain a table with
> three columns: project_name, release_id and release_date. So each project
> output could be released multiple times (I am interested in the first
> release_date). However, some of the data is missing.
> Basically, what I want to do is to try and fill the missing data in 1
> table with the data from the other table. The difficulty here is that
> table1$project_name IS NOT table2$project_name. Example: green-tree and
> green tree, new(Jacket) and newJacket.
If there is a general matching rule, for example "Remove all special
characters", you could read in both tables with separate RODC queries, add a
new column "projectcore" to both tables that is generated by
a$projectcore = gsub("[\(\)-]","",a$project_name)
# not tested, this might require some of Dalgaard's "if you think you have
escaped enough, double it".
and use a join (called merge() in R, or with package sqldf).
If there is no general matching rule, I would create special translation
table with two columns, e.g
and retrieve projectcore instead of project_name in the query. I this case,
you could also use an SQL join directly on both tables.
View this message in context: http://r.789695.n4.nabble.com/Matching-2-SQL-tables-tp3159678p3160306.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help