[R] database table merging tips with R
m_olshansky at yahoo.com
Fri Sep 12 01:14:11 CEST 2008
Just a small correction:
s <- paste(r$userid,collapse=",")
s <- paste(r$userid,sep=",")
--- On Fri, 12/9/08, Moshe Olshansky <m_olshansky at yahoo.com> wrote:
> From: Moshe Olshansky <m_olshansky at yahoo.com>
> Subject: Re: [R] database table merging tips with R
> To: r-help at stat.math.ethz.ch, "Avram Aelony" <aavram at mac.com>
> Received: Friday, 12 September, 2008, 8:59 AM
> One possibility is as follows:
> If r$userid is your array of (2000) ID's then
> s <- paste(r$userid,sep=",")
> s<- paste("select t.userid, x, y, z from largetable
> t where t.serid in (",s,")",sep="")
> and finally
> d <- sqlQuery(connection,s)
> --- On Fri, 12/9/08, Avram Aelony <aavram at mac.com>
> > From: Avram Aelony <aavram at mac.com>
> > Subject: [R] database table merging tips with R
> > To: r-help at stat.math.ethz.ch
> > Received: Friday, 12 September, 2008, 4:33 AM
> > Dear R list,
> > What is the best way to efficiently marry an R dataset
> > a very large (Oracle) database table?
> > The goal is to only return Oracle table rows that
> match IDs
> > present in the R dataset.
> > I have an R data frame with 2000 user IDs analogous
> to: r =
> > data.frame(userid=round(runif(2000)*100000,0))
> > ...and I need to pull data from an Oracle table only
> > these 2000 IDs. The Oracle table is quite large.
> > Additionally, the sql query may need to join to other
> > to bring in ancillary fields.
> > I currently connect to Oracle via odbc:
> > library(RODBC)
> > connection <- odbcConnect("****",
> > uid="****", pwd="****")
> > d = sqlQuery(connection, "select userid, x, y, z
> > largetable where timestamp > sysdate -7")
> > ...allowing me to pull data from the database table
> > the R object "d" and then use the R merge
> > function. The problem however is that if
> "d" is
> > too large it may fail due to memory limitations or be
> > inefficient. I would like to push the merge portion
> to the
> > database and it would be very convenient if it were
> > to request that the query look to the R object for the
> > ID's to which it should restrict the output.
> > Is there a way to do this?
> > Something like the following fictional code:
> > d = sqlQuery(connection, "select t.userid, x, y,
> > from largetable t where r$userid=t.userid")
> > Would sqldf (http://code.google.com/p/sqldf/) help me
> > here? If so, how? This would be convenient and help
> > avoid needing to create a temporary table to store the
> > data, join via sql, then return the data back to R.
> > I am using R version 2.7.2 (2008-08-25) /
> i386-pc-mingw32 .
> > Thanks for your comments, ideas, recommendations.
> > -Avram
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> > reproducible code.
> R-help at r-project.org mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained,
> reproducible code.
More information about the R-help