[BioC] matrix transformation
Vincent Carey
stvjc at channing.harvard.edu
Mon Oct 18 12:10:16 CEST 2010
This question has no direct relationship to Bioconductor. You should
look at the archives of the SIG-DB
mailing list for R
https://stat.ethz.ch/mailman/listinfo/r-sig-db
and perhaps pose the question there if it has not been previously handled.
DBI's dbWriteTable is surely a relevant function to understand.
Various Bioconductor facilities use
DBI to create and manage large tables in SQLite -- AnnotationDbi and
Genominator are two packages that come to mind.
Other schemes for handling large data resources alongside R are
described in the CRAN task view for high performance computing.
On Mon, Oct 18, 2010 at 5:41 AM, Bucher Elmar <Ext-Elmar.Bucher at vtt.fi> wrote:
> Dear Mailing List,
>
> I wrote the following function "matrix2tuple.sf" to translate a "cartesian xy matrix" as a "tuple matrix", to store it in relational database.
> The code works fine for a test set. My problem is, my real matrix is 7000 x 10000 big, which ends up in 70'000'0000' tuples.
> Transformation takes days X(...
> Has anyone an idea, how I can optimize the described functions for speed?
>
> Best Wishes, Elmar Bucher
>
>
>
> ##### BEGIN CODE LISTING ####
>
> matrix2tuple.sf <- function(xy.matrix.m = NULL) {
> tuple.m <- NULL
> #x.length.v <- dim(xy.matrix.m)[2]
> #y.length.v <- dim(xy.matrix.m)[1]
> #for (x.v in (1:x.length.v)) {
> #for (y.v in (1:y.length.v)) {
> x.axis.v <- colnames(xy.matrix.m)
> y.axis.v <- rownames(xy.matrix.m)
> for (x.v in x.axis.v) {
> for (y.v in y.axis.v) {
> #cat(x.v, y.v, xy.matrix.m[y.v,x.v],"\n")
> tuple.v <- c(x.v, y.v, xy.matrix.m[y.v,x.v])
> if (is.null(tuple.v)) {
> tuple.m <- tuple.v
> } else {
> tuple.m <- rbind(tuple.m, tuple.v)
> }
> }
> }
> return(tuple.m)
> }
>
>
> put.db.sf <- function(conn.s4=NULL, x.v=NULL, y.v=NULL, xy.v=NULL) {
> query.v <- paste("INSERT INTO matrixdbtb ('xaxis','yaxis','xy') VALUES('",x.v,"','",y.v,"','",xy.v,"');", sep ="")
> #catch.df <- dbGetQuery(conn.s4, query.v)
> cat(query.v, "\n")
> }
>
> ## main ##
> matrix.m <- c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16")
> dim(matrix.m) <- c(4,4)
> colnames(matrix.m) <- c("A","B","C","D")
> rownames(matrix.m) <- c("a","b","c","d")
> tuple.m <- matrix2tuple.sf (matrix.m)
> for (i in 1: dim(tuple.m)[1]) {
> put.db.sf(x.v=tuple.m[i,1],y.v=tuple.m[i,2],xy.v=tuple.m[i,3])
> }
>
> #### END CODE LISTING ########
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list