[R] database table merging tips with R

Moshe Olshansky m_olshansky at yahoo.com
Fri Sep 12 01:14:11 CEST 2008


Just a small correction:

start with 

s <- paste(r$userid,collapse=",")

and not 

s <- paste(r$userid,sep=",")

--- On Fri, 12/9/08, Moshe Olshansky <m_olshansky at yahoo.com> wrote:

> From: Moshe Olshansky <m_olshansky at yahoo.com>
> Subject: Re: [R] database table merging tips with R
> To: r-help at stat.math.ethz.ch, "Avram Aelony" <aavram at mac.com>
> Received: Friday, 12 September, 2008, 8:59 AM
> One possibility is as follows:
> 
> If r$userid is your array of (2000) ID's then
> s <- paste(r$userid,sep=",")
> s<- paste("select t.userid, x, y, z from largetable
> t where t.serid in (",s,")",sep="")
> and finally
> d <- sqlQuery(connection,s)
> 
> Regards,
> 
> Moshe.
> 
> 
> --- On Fri, 12/9/08, Avram Aelony <aavram at mac.com>
> wrote:
> 
> > From: Avram Aelony <aavram at mac.com>
> > Subject: [R] database table merging tips with R
> > To: r-help at stat.math.ethz.ch
> > Received: Friday, 12 September, 2008, 4:33 AM
> > Dear R list,
> > 
> > What is the best way to efficiently marry an R dataset
> with
> > a very large (Oracle) database table?  
> > 
> > The goal is to only return Oracle table rows that
> match IDs
> > present in the R dataset.  
> > I have an R data frame with 2000 user IDs analogous
> to: r =
> > data.frame(userid=round(runif(2000)*100000,0))
> > 
> > ...and I need to pull data from an Oracle table only
> for
> > these 2000 IDs.  The Oracle table is quite large.
> > Additionally, the sql query may need to join to other
> tables
> > to bring in ancillary fields.
> > 
> > I currently connect to Oracle via odbc: 
> > 
> > library(RODBC)
> > connection <- odbcConnect("****",
> > uid="****", pwd="****")
> > d = sqlQuery(connection, "select userid, x, y, z
> from
> > largetable where timestamp > sysdate -7")
> > 
> > ...allowing me to pull data from the database table
> into
> > the R object "d" and then use the R merge
> > function.  The problem however is that if
> "d" is
> > too large it may fail due to memory limitations or be
> > inefficient.  I would like to push the merge portion
> to the
> > database and it would be very convenient if it were
> possible
> > to request that the query look to the R object for the
> > ID's to which it should restrict the output.  
> > 
> > Is there a way to do this?
> > Something like the following fictional code:
> > d = sqlQuery(connection, "select t.userid, x, y,
> z
> > from largetable t where r$userid=t.userid")
> > 
> > Would sqldf (http://code.google.com/p/sqldf/) help me
> out
> > here? If so, how?   This would be convenient and help
> me
> > avoid needing to create a temporary table to store the
> R
> > data, join via sql, then return the data back to R.
> > 
> > I am using R version 2.7.2 (2008-08-25) /
> i386-pc-mingw32 .
> > 
> > Thanks for your comments, ideas, recommendations.
> > 
> > 
> > -Avram
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> > reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.



More information about the R-help mailing list