On Mon, Oct 18, 2010 at 5:41 AM, Bucher Elmar wrote:
> Dear Mailing List,
>
> I wrote the following function "matrix2tuple.sf" to translate a "cartesian
> xy matrix" as a "tuple matrix", to store it in relational database.
> The code works fine for a test set. My problem is, my real matrix is 7000 x
> 10000 big, which ends up in 70'000'0000' tuples.
> Transformation takes days X(...
> Has anyone an idea, how I can optimize the described functions for speed?
>
> Best Wishes, Elmar Bucher
>
>
>
> ##### BEGIN CODE LISTING ####
>
> matrix2tuple.sf <- function(xy.matrix.m = NULL) {
> tuple.m <- NULL
> #x.length.v <- dim(xy.matrix.m)[2]
> #y.length.v <- dim(xy.matrix.m)[1]
> #for (x.v in (1:x.length.v)) {
> #for (y.v in (1:y.length.v)) {
> x.axis.v <- colnames(xy.matrix.m)
> y.axis.v <- rownames(xy.matrix.m)
> for (x.v in x.axis.v) {
> for (y.v in y.axis.v) {
> #cat(x.v, y.v, xy.matrix.m[y.v,x.v],"\n")
> tuple.v <- c(x.v, y.v, xy.matrix.m[y.v,x.v])
> if (is.null(tuple.v)) {
> tuple.m <- tuple.v
> } else {
> tuple.m <- rbind(tuple.m, tuple.v)
> }
> }
> }
> return(tuple.m)
> }
>
>
Hi, Elmer.
The following should be pretty quick.
tuple.m =
data.frame(xaxis=rep(colnames(matrix.m),each=nrow(matrix.m)),yaxis=rep(rownames(matrix.m)),xy=as.vector(matrix.m))
>
> put.db.sf <- function(conn.s4=NULL, x.v=NULL, y.v=NULL, xy.v=NULL) {
> query.v <- paste("INSERT INTO matrixdbtb ('xaxis','yaxis','xy')
> VALUES('",x.v,"','",y.v,"','",xy.v,"');", sep ="")
> #catch.df <- dbGetQuery(conn.s4, query.v)
> cat(query.v, "\n")
> }
>
> ## main ##
> matrix.m <-
> c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16")
> dim(matrix.m) <- c(4,4)
> colnames(matrix.m) <- c("A","B","C","D")
> rownames(matrix.m) <- c("a","b","c","d")
> tuple.m <- matrix2tuple.sf (matrix.m)
> for (i in 1: dim(tuple.m)[1]) {
> put.db.sf(x.v=tuple.m[i,1],y.v=tuple.m[i,2],xy.v=tuple.m[i,3])
> }
>
>
For loading this many rows, you will want to be using all the normal tricks
of your database. This will often include removing indexes from the table,
temporarily disabling foreign key constraints on the table, etc. You should
minimally be using a transaction around, say, every 100000 inserts.
Finally, several databases have a specific "loader" functionality that can
take large tables like your 70M row table and insert them very quickly.
dbWriteTable might include this functionality or might not, depending on
the database. Assuming this is a one-off affair, though, even writing the
table to disk and then using a dedicated loader will probably work just
fine.
If you have questions on the database side of things, perhaps the best place
to write is the R-sig-db list.
Sean
> #### END CODE LISTING ########
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]