[R] RJDBC vs RMySQL vs ???

James W. MacDonald jmacdon at med.umich.edu
Wed Jun 23 22:36:29 CEST 2010


Hi Ralf,

Ralf B wrote:
> I am running a simple SQL SELECT statement that involvs 50k + data
> points using R and the RJDBC interface. I am facing very slow response
> times in both the RGUI and the R console. When running this SQL
> statement directly in a SQL client I have processing times that are a
> lot lot faster (which means that the SQL statement itself is not the
> problem).
> 
> Did any of you compare RJDBC vs RMySQL or is there a better, more
> efficient way to extract large data from databases using R? Would you
> recommend dumping data out completely into flat files and working with
> flat files instead? I expected that this would not be such a problem
> given that businesses maintain their data in DBs and R is supposed to
> be good in shifting around data. Am I doing something wrong?

Well, if you don't show people what you have done, how can anybody tell 
if you are doing something wrong or not?

I have no experience with RJDBC, so cannot say anything about that. 
However, I have always found RMySQL to be speedy enough. As an example:

 > library(RMySQL)
Loading required package: DBI
 > con <- dbConnect("MySQL", host="genome-mysql.cse.ucsc.edu", user = 
"genome", dbname = "hg18")
 > system.time(a <- dbGetQuery(con, "select name, chromEnd from snp129 
where chrom='chr1' and chromStart between 1 and 1e8;")
+ )
    user  system elapsed
    7.95    0.06   38.59
 > dim(a)
[1] 508676      2

So 40 seconds to get half a million records. Since this is via the 
internet, I have to imagine things would be much faster querying a local DB.

But then you never say what constitutes 'slow' for you, so maybe this is 
slow as well?

Best,

Jim


> 
> Ralf
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the R-help mailing list