[R] parLapply on sqlQuery (from package RODBC)
Frede Aakmann Tøgersen
frtog at vestas.com
Thu Jul 3 09:51:06 CEST 2014
Hi
Why are you doing duplicate queries to the database (two As and Cs in your names vector)?
Why do 5 simultaneously connection to the database server? Woukld you do 500 connections?
Why not do one query and let the database server do the job for you?
Try this:
> options(useFancyQuotes = FALSE)
>
> query0 <- "select id from table where name in (%s)"
>
> names <- paste(sQuote(LETTERS[1:5]), collapse = ",")
> names
[1] "'A','B','C','D','E'"
>
> query <- sprintf(query0, names)
> query
[1] "select id from table where name in ('A','B','C','D','E')"
>
> dataFromDB <- sqlQuery(dbConn, query)
This should work for MS SQL and MySQL servers.
Yours sincerely / Med venlig hilsen
Frede Aakmann Tøgersen
Specialist, M.Sc., Ph.D.
Plant Performance & Modeling
Technology & Service Solutions
T +45 9730 5135
M +45 2547 6050
frtog at vestas.com
http://www.vestas.com
Company reg. name: Vestas Wind Systems A/S
This e-mail is subject to our e-mail disclaimer statement.
Please refer to www.vestas.com/legal/notice
If you have received this e-mail in error please contact the sender.
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Tevlin, Dylan
> Sent: 2. juli 2014 23:49
> To: r-help at r-project.org
> Subject: [R] parLapply on sqlQuery (from package RODBC)
>
> R Version : 2.14.1 x64
> Running on Windows 7
>
> Connecting to a database on a remote Microsoft SQL Server 2012
>
> The short form of my problem is the following.
>
> I have an unordered vectors of names, say:
>
> names<-c("A", "B", "A", "C","C")
>
> each of which have an id in a table in my db. I need to convert the names to
> their corresponding ids.
>
> I currently have the following code to do it.
> ###
> names<-c("A", "B", "A", "C","C")
> dbConn<-odbcDriverConnect(connection="connection string") #successfully
> connects
>
> nameToID<-function(name, dbConn){
> #dbConn : active db connection formed via odbcDriverConnect
> #name : a char string
>
> sqlQuery(dbConn, paste("select id from table where name='", name,
> "'", sep=""))
> }
> sapply(names, nameToID, dbConn=dbConn)
> ###
>
> Barring better ways to do this, which could involve loading the table into R
> then working with the problem there (which is possible), I understand why
> the following doesn't work, but I cannot seem to find a solution. Attempting
> to use parallelization via the package 'parallel' :
>
> ###
> names<-c("A", "B", "A", "C","C")
> dbConn<-odbcDriverConnect(connection="connection string") #successfully
> connects
>
> nameToID<-function(name, dbConn){
> #dbConn : active db connection formed via odbcDriverConnect
> #name : a char string
>
> sqlQuery(dbConn, paste("select id from table where name='", name,
> "'", sep=""))
> }
>
> mc<-detectCores()
> cl<-makeCluster(mc)
> clusterExport(cl, c("sqlQuery", "dbConn"))
> parSapply(cl, names, nameToID, dbConn=dbConn) #incorrect passing of
> nameToID's second argument
> ###
>
> As in the comment, this is not the correct way to assign the second argument
> to nameToID.
>
> I have also tried the following:
>
> parSapply(cl, names, function(x) nameToID(x, dbConn))
>
> in place of the previous parSapply call, but that also does not work, with the
> error being thrown saying "the first parameter is not an open RODBC
> connection", presumably referring to the first parameter of the sqlQuery()
>
> The following code does work with parallization.
>
> ###
> names<-c("A", "B", "A", "C","C")
> dbConn<-odbcDriverConnect(connection="connection string") #successfully
> connects
> nameToID<-function(name){
> #name : a char string
> dbConn<-odbcDriverConnect(connection="string")
> result<-sqlQuery(dbConn, paste("select id from table where
> name='", name, "'", sep=""))
> odbcClose(dbConn)
> result
> }
>
> mc<-detectCores()
> cl<-makeCluster(mc)
> clusterExport(cl, c("sqlQuery", "odbcDriverConnect", "odbcClose", "dbConn",
> "nameToID")) #throwing everything in
> parSapply(cl, names, nameToID)
> ###
>
> But the constant opening and closing a ton of the gains from parallelization,
> and seems just a bit silly.
>
> So the overall question would be how to pass the second parameter (the
> open db connection) to the function within parSapply, in much the same way
> as it is done in the regular apply? In general, how does one pass a second,
> third, nth parameter to a function within a parallel routine?
>
> Thanks and if you need any more information let me know.
>
> -DT
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list