[R] Cube of Matrices or list of Matrices

Roy Mendelssohn - NOAA Federal roy.mendelssohn at noaa.gov
Tue Jan 20 04:25:31 CET 2015


I believe what Karim is after is often referred to as a “ragged array”.  For disk storage, such structures have been added to netcdf4 for things like subsurface profiles with a different number of depths.

This blog might be of interest:

http://www.r-bloggers.com/efficient-ragged-arrays-in-r-and-rcpp/

As well as just generally googling on  “R ragged arrays”

-Roy

On Jan 19, 2015, at 7:13 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:

> I use plyr and am learning dplyr and magrittr, but those are just syntactic sugar. What I have been having difficulty with in this thread is the idea that it somehow makes sense to pad vectors with NA values... because I really don't think it does. It seems more like a hammer looking for a nail because that is what it knows how to deal with.
> 
> You have a list of matrices with data in them, and switching from for loops to lapply is not in itself going to fix a memory or speed problem... normally the big improvements are in the way you allocate and use your data. Burns talks about pre-allocating the result to speed things up, but I don't understand the problem well enough to suggest an efficient data structure to pre-allocate.
> 
> I suggest that Karim read and adhere to the Posting Guide (particularly the bits about giving a reproducible example and posting in plain text so it doesn't get scrambled) if help with optimizing is desired. The discussion at [1] might clarify what "reproducible" means.
> 
> I will also mention that efficient algorithms for this subject area are frequently available in the Bioconductor project, so I hope you are not re-inventing the wheel and have already reviewed their tools.
> 
> [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                      Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> --------------------------------------------------------------------------- 
> Sent from my phone. Please excuse my brevity.
> 
> On January 19, 2015 6:11:38 PM PST, Ben Tupper <btupper at bigelow.org> wrote:
>> Hi,
>> 
>> On Jan 19, 2015, at 5:17 PM, Karim Mezhoud <kmezhoud at gmail.com> wrote:
>> 
>>> Thanks Ben.
>>> I need to learn more about apply. Have you a link or tutorial about
>> apply. R documentation is very short.
>>> 
>>> How can obtain:
>>> z <- list (Col1, Col2, Col3, Col4......)?
>>> 
>> 
>> This may not be the most efficient way and there certainly is no error
>> checking, but you can wrap one lapply within another as shown below. 
>> The innermost iterates over your list of input matrices, extracting one
>> column specified per list element.  The outer lapply iterates over the
>> various column numbers you want to extract.
>> 
>> 
>> getMatrices <- function(colNums, dataList = x){
>>  # the number of rows required
>>  n <- max(sapply(dataList, nrow))
>> lapply(colNums, function(x, dat, n) { # iterate along requested columns
>> do.call(cbind, lapply(dat, getColumn,x, len=n)) # iterate along input
>> data list
>>  }, dataList, n)
>> }
>> 
>> getMatrices(c(1,3), dataList = x)  
>> 
>> If we are lucky, one of the plyr package users might show us how to do
>> the same with a one-liner. 
>> 
>> 
>> There are endless resources online, here are some gems. 
>> 
>> http://www.r-project.org/doc/bib/R-books.html 
>> http://www.rseek.org/
>> http://www.burns-stat.com/documents/
>> http://www.r-bloggers.com/
>> 
>> Also, I found "Data Manipulation with R" (
>> http://www.r-project.org/doc/bib/R-books_bib.html#R:Spector:2008 )
>> helpful.  
>> 
>> Ben
>> 
>>> Thanks
>>> 
>>>  Ô__
>>> c/ /'_;~~~~kmezhoud
>>> (*) \(*)   ⴽⴰⵔⵉⵎ  ⵎⴻⵣⵀⵓⴷ
>>> http://bioinformatics.tn/
>>> 
>>> 
>>> 
>>> On Mon, Jan 19, 2015 at 8:22 PM, Ben Tupper <btupper at bigelow.org>
>> wrote:
>>> Hi again,
>>> 
>>> On Jan 19, 2015, at 1:53 PM, Karim Mezhoud <kmezhoud at gmail.com>
>> wrote:
>>> 
>>>> Yes Many thanks.
>>>> That is my request using lapply.
>>>> 
>>>> do.call(cbind,col1)
>>>> 
>>>> converts col1 to matrix but does not fill empty value with NA.
>>>> 
>>>> Even for
>>>> 
>>>> matrix(unlist(col1), ncol=5,byrow = FALSE)
>>>> 
>>>> 
>>>> How can get Matrix class of col1? And fill empty values with NA?
>>>> 
>>> 
>>> Perhaps best is to determine the maximum number of rows required
>> first, then force each subset to have that length.
>>> 
>>> # make a list of matrices, each with nCol columns and differing
>>> # number of rows
>>> nCol <- 3
>>> nRow <- sample(3:10, 5)
>>> x <- lapply(nRow, function(x, nc) {matrix(x:(x + nc*x - 1), ncol =
>> nc, nrow = x)}, nCol)
>>> x
>>> 
>>> # make a simple function to get a single column from a matrix
>>> getColumn <- function(x, colNum, len = nrow(x)) {
>>>   y <- x[,colNum]
>>>   length(y) <- len
>>>   y
>>> }
>>> 
>>> # what is the maximum number of rows
>>> n <- max(sapply(x, nrow))
>>> 
>>> # use the function to get the column from each matrix
>>> col1 <- lapply(x, getColumn, 1, len = n)
>>> col1
>>> 
>>> do.call(cbind, col1)
>>>      [,1] [,2] [,3] [,4] [,5]
>>> [1,]    3    8    5    7    9
>>> [2,]    4    9    6    8   10
>>> [3,]    5   10    7    9   11
>>> [4,]   NA   11    8   10   12
>>> [5,]   NA   12    9   11   13
>>> [6,]   NA   13   NA   12   14
>>> [7,]   NA   14   NA   13   15
>>> [8,]   NA   15   NA   NA   16
>>> [9,]   NA   NA   NA   NA   17
>>> 
>>> Ben
>>> 
>>>> Thanks
>>>> Karim
>>>> 
>>>> 
>>>>  Ô__
>>>> c/ /'_;~~~~kmezhoud
>>>> (*) \(*)   ⴽⴰⵔⵉⵎ  ⵎⴻⵣⵀⵓⴷ
>>>> http://bioinformatics.tn/
>>>> 
>>>> 
>>>> 
>>>> On Mon, Jan 19, 2015 at 4:36 PM, Ben Tupper <ben.bighair at gmail.com>
>> wrote:
>>>> Hi,
>>>> 
>>>> On Jan 18, 2015, at 4:36 PM, Karim Mezhoud <kmezhoud at gmail.com>
>> wrote:
>>>> 
>>>>> Dear All,
>>>>> I am trying to get correlation between  Diseases (80) in columns
>> and
>>>>> samples in rows (UNEQUAL) using gene expression (at less
>> 1000,numeric). For
>>>>> this I can use CORREP package with cor.unbalanced function.
>>>>> 
>>>>> But before to get this final matrix I need to load and to store
>> the
>>>>> expression of 1000 genes for every Disease (80). Every disease has
>>>>> different number of samples (between 50 - 500).
>>>>> 
>>>>> It is possible to get a cube of matrices with equal columns but
>> unequal
>>>>> rows? I think NO and I can't use array function.
>>>>> 
>>>>> I am trying to get à list of matrices having the same number of
>> columns but
>>>>> different number of rows. as
>>>>> 
>>>>> Cubist <- vector("list", 1)
>>>>> Cubist$Expression <- vector("list", 1)
>>>>> 
>>>>> 
>>>>> for (i in 1:80){
>>>>> 
>>>>> matrix <- function(getGeneExpression[i])
>>>>> Cubist$Expression[[Disease[i]]] <- matrix
>>>>> 
>>>>> }
>>>>> 
>>>>> At this step I have:
>>>>> length(Cubist$Expression)
>>>>> #80
>>>>> dim(Cubist$Expression$Disease1)
>>>>> #526 1000
>>>>> dim(Cubist$Expression$Disease2)
>>>>> #106  1000
>>>>> 
>>>>> names(Cubist$Expression$Disease1[4])
>>>>> #ABD
>>>>> 
>>>>> names(Cubist$Expression$Disease2[4])
>>>>> #ABD
>>>>> 
>>>>> Now I need to built the final matrices for every genes (1000) that
>> I will
>>>>> use for CORREP function.
>>>>> 
>>>>> Is there a way to extract directly the first column (first gene)
>> for all
>>>>> Diseases (80)  from Cubist$Expression? or
>>>>> 
>>>> 
>>>> I don't understand most your question, but the above seems to be
>> straight forward.  Here's a toy example:
>>>> 
>>>> # make a list of matrices, each with nCol columns and differing
>>>> # number of rows, nRow
>>>> nCol <- 3
>>>> nRow <- sample(3:10, 5)
>>>> x <- lapply(nRow, function(x, nc) {matrix(x:(x + nc*x - 1), ncol =
>> nc, nrow = x)}, nCol)
>>>> x
>>>> 
>>>> # make a simple function to get a single column from a matrix
>>>> getColumn <- function(x, colNum) {
>>>>   return(x[,colNum])
>>>> }
>>>> 
>>>> # use the function to get the column from each matrix
>>>> col1 <- lapply(x, getColumn, 1)
>>>> col1
>>>> 
>>>> Does that help answer this part of your question?  If not, you may
>> need to create a very small example of your data and post it here using
>> the head() and dput() functions.
>>>> 
>>>> Ben
>>>> 
>>>> 
>>>> 
>>>>> I need to built 1000 matrices with 80 columns and unequal rows?
>>>>> 
>>>>> Cublist$Diseases <- vector("list", 1)
>>>>> 
>>>>> for (k in 1:1000){
>>>>> for (i in 1:80){
>>>>> 
>>>>> Cublist$Diseases[[gene[k] ]] <- Cubist$Expression[[Diseases[i]
>> ]][k]
>>>>> }
>>>>> 
>>>>> }
>>>>> 
>>>>> This double loops is time consuming...Is there a way to do this
>> faster?
>>>>> 
>>>>> Thanks,
>>>>> karim
>>>>> Ô__
>>>>> c/ /'_;~~~~kmezhoud
>>>>> (*) \(*)   ⴽⴰⵔⵉⵎ  ⵎⴻⵣⵀⵓⴷ
>>>>> http://bioinformatics.tn/
>>>>> 
>>>>>      [[alternative HTML version deleted]]
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>>> 
>>> 
>>> Ben Tupper
>>> Bigelow Laboratory for Ocean Sciences
>>> 60 Bigelow Drive, P.O. Box 380
>>> East Boothbay, Maine 04544
>>> http://www.bigelow.org
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> Ben Tupper
>> Bigelow Laboratory for Ocean Sciences
>> 60 Bigelow Drive, P.O. Box 380
>> East Boothbay, Maine 04544
>> http://www.bigelow.org
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

**********************
"The contents of this message do not reflect any position of the U.S. Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: Roy.Mendelssohn at noaa.gov www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.



More information about the R-help mailing list