[R] Cube of Matrices or list of Matrices

Karim Mezhoud kmezhoud at gmail.com
Tue Feb 10 13:37:48 CET 2015


Thanks Ben, Jeff and Roy,
Here is an example of my data

Disease <- NULL
Diseases <- NULL
ListMatByGene <- NULL
for(i in 1:3){

Disease[[i]] <-matrix(sample(-30:30,25+(5*i)),5+i)
rownames(Disease[[i]]) <- paste0("Sample",1:(5+i))
colnames(Disease[[i]]) <- paste0("Gene",1:5)

D <- paste0("Disease",i)
Diseases[[D]] <- Disease[[i]]
}

getColumn <- function(x, colNum, len = nrow(x)){
    y <- x[,colNum]
    length(y) <- len
    y
}

getMatrices <- function(colNums, dataList = x){
    # the number of rows required
    n <- max(sapply(dataList, nrow))
    lapply(colNums, function(x, dat, n) { # iterate along requested columns
        do.call(cbind, lapply(dat, getColumn,x, len=n)) # iterate along
input data list
    }, dataList, n)
}
G <- paste0("Gene",1:5)
ListMatByGene[G] <- getMatrices(c(1:ncol(Diseases[[1]])),dataList=Diseases)

## get Disease correlation by gene
DiseaseCorrelation <- lapply(ListMatByGene,function(x) cor(x,use="na",
method="spearman"))

##convert the list of Matrices to array
ArrayDiseaseCor <- array(unlist(DiseaseCorrelation), dim =
c(nrow(DiseaseCorrelation[[1]]), ncol(DiseaseCorrelation[[1]]),
length(DiseaseCorrelation)))
dimnames(ArrayDiseaseCor) <- list(names(Diseases), names(Diseases),
colnames(Diseases[[1]]))

FilterDiseaseCor <- apply(ArrayDiseaseCor,MARGIN=c(1,2) ,function(x)
x[abs(x)>0.5])

FilterDiseaseCor

         Disease1   Disease2  Disease3
Disease1 Numeric,5  Numeric,2 -0.9428571
Disease2 Numeric,2  Numeric,5 Numeric,2
Disease3 -0.9428571 Numeric,2 Numeric,5


Question is:
How can get a table as:

D1              D2               Cor       Gene
Disease1    Disease2      -0.94    Gene2
Disease1    Disease2       0.78    Gene4
Disease3    Disease2       0.5      Gene5
...

and
                 Disease1   Disease2      Disease3
Disease1        5                1                0
Disease2        1                 5                3
Disease3        0                  3               5



Thanks
Karim




On Tue, Jan 20, 2015 at 2:11 AM, Ben Tupper <btupper at bigelow.org> wrote:

> Hi,
>
> On Jan 19, 2015, at 5:17 PM, Karim Mezhoud <kmezhoud at gmail.com> wrote:
>
> Thanks Ben.
> I need to learn more about apply. Have you a link or tutorial about apply.
> R documentation is very short.
>
> How can obtain:
> z <- list (Col1, Col2, Col3, Col4......)?
>
>
> This may not be the most efficient way and there certainly is no error
> checking, but you can wrap one lapply within another as shown below.  The
> innermost iterates over your list of input matrices, extracting one column
> specified per list element.  The outer lapply iterates over the various
> column numbers you want to extract.
>
>
> getMatrices <- function(colNums, dataList = x){
>    # the number of rows required
>    n <- max(sapply(dataList, nrow))
>    lapply(colNums, function(x, dat, n) { # iterate along requested columns
>       do.call(cbind, lapply(dat, getColumn,x, len=n)) # iterate along
> input data list
>    }, dataList, n)
> }
>
> getMatrices(c(1,3), dataList = x)
>
> If we are lucky, one of the plyr package users might show us how to do the
> same with a one-liner.
>
>
> There are endless resources online, here are some gems.
>
> http://www.r-project.org/doc/bib/R-books.html
> http://www.rseek.org/
> http://www.burns-stat.com/documents/
> http://www.r-bloggers.com/
>
> Also, I found "Data Manipulation with R" (
> http://www.r-project.org/doc/bib/R-books_bib.html#R:Spector:2008 )
> helpful.
>
> Ben
>
> Thanks
>
>   Ô__
>  c/ /'_;~~~~kmezhoud
> (*) \(*)   ⴽⴰⵔⵉⵎ  ⵎⴻⵣⵀⵓⴷ
> http://bioinformatics.tn/
>
>
>
> On Mon, Jan 19, 2015 at 8:22 PM, Ben Tupper <btupper at bigelow.org> wrote:
>
>> Hi again,
>>
>> On Jan 19, 2015, at 1:53 PM, Karim Mezhoud <kmezhoud at gmail.com> wrote:
>>
>> Yes Many thanks.
>> That is my request using lapply.
>>
>> do.call(cbind,col1)
>>
>>  converts col1 to matrix but does not fill empty value with NA.
>>
>> Even for
>>
>> matrix(unlist(col1), ncol=5,byrow = FALSE)
>>
>>
>> How can get Matrix class of col1? And fill empty values with NA?
>>
>>
>> Perhaps best is to determine the maximum number of rows required first,
>> then force each subset to have that length.
>>
>> # make a list of matrices, each with nCol columns and differing
>> # number of rows
>> nCol <- 3
>> nRow <- sample(3:10, 5)
>> x <- lapply(nRow, function(x, nc) {matrix(x:(x + nc*x - 1), ncol = nc,
>> nrow = x)}, nCol)
>> x
>>
>> # make a simple function to get a single column from a matrix
>> getColumn <- function(x, colNum, len = nrow(x)) {
>>    y <- x[,colNum]
>>    length(y) <- len
>>    y
>> }
>>
>> # what is the maximum number of rows
>> n <- max(sapply(x, nrow))
>>
>> # use the function to get the column from each matrix
>> col1 <- lapply(x, getColumn, 1, len = n)
>> col1
>>
>> do.call(cbind, col1)
>>       [,1] [,2] [,3] [,4] [,5]
>>  [1,]    3    8    5    7    9
>>  [2,]    4    9    6    8   10
>>  [3,]    5   10    7    9   11
>>  [4,]   NA   11    8   10   12
>>  [5,]   NA   12    9   11   13
>>  [6,]   NA   13   NA   12   14
>>  [7,]   NA   14   NA   13   15
>>  [8,]   NA   15   NA   NA   16
>>  [9,]   NA   NA   NA   NA   17
>>
>> Ben
>>
>> Thanks
>> Karim
>>
>>
>>   Ô__
>>  c/ /'_;~~~~kmezhoud
>> (*) \(*)   ⴽⴰⵔⵉⵎ  ⵎⴻⵣⵀⵓⴷ
>> http://bioinformatics.tn/
>>
>>
>>
>> On Mon, Jan 19, 2015 at 4:36 PM, Ben Tupper <ben.bighair at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> On Jan 18, 2015, at 4:36 PM, Karim Mezhoud <kmezhoud at gmail.com> wrote:
>>>
>>> > Dear All,
>>> > I am trying to get correlation between  Diseases (80) in columns and
>>> > samples in rows (UNEQUAL) using gene expression (at less
>>> 1000,numeric). For
>>> > this I can use CORREP package with cor.unbalanced function.
>>> >
>>> > But before to get this final matrix I need to load and to store the
>>> > expression of 1000 genes for every Disease (80). Every disease has
>>> > different number of samples (between 50 - 500).
>>> >
>>> > It is possible to get a cube of matrices with equal columns but unequal
>>> > rows? I think NO and I can't use array function.
>>> >
>>> > I am trying to get à list of matrices having the same number of
>>> columns but
>>> > different number of rows. as
>>> >
>>> > Cubist <- vector("list", 1)
>>> > Cubist$Expression <- vector("list", 1)
>>> >
>>> >
>>> > for (i in 1:80){
>>> >
>>> > matrix <- function(getGeneExpression[i])
>>> > Cubist$Expression[[Disease[i]]] <- matrix
>>> >
>>> > }
>>> >
>>> > At this step I have:
>>> > length(Cubist$Expression)
>>> > #80
>>> > dim(Cubist$Expression$Disease1)
>>> > #526 1000
>>> > dim(Cubist$Expression$Disease2)
>>> > #106  1000
>>> >
>>> > names(Cubist$Expression$Disease1[4])
>>> > #ABD
>>> >
>>> > names(Cubist$Expression$Disease2[4])
>>> > #ABD
>>> >
>>> > Now I need to built the final matrices for every genes (1000) that I
>>> will
>>> > use for CORREP function.
>>> >
>>> > Is there a way to extract directly the first column (first gene) for
>>> all
>>> > Diseases (80)  from Cubist$Expression? or
>>> >
>>>
>>> I don't understand most your question, but the above seems to be
>>> straight forward.  Here's a toy example:
>>>
>>> # make a list of matrices, each with nCol columns and differing
>>> # number of rows, nRow
>>> nCol <- 3
>>> nRow <- sample(3:10, 5)
>>> x <- lapply(nRow, function(x, nc) {matrix(x:(x + nc*x - 1), ncol = nc,
>>> nrow = x)}, nCol)
>>> x
>>>
>>> # make a simple function to get a single column from a matrix
>>> getColumn <- function(x, colNum) {
>>>    return(x[,colNum])
>>> }
>>>
>>> # use the function to get the column from each matrix
>>> col1 <- lapply(x, getColumn, 1)
>>> col1
>>>
>>> Does that help answer this part of your question?  If not, you may need
>>> to create a very small example of your data and post it here using the
>>> head() and dput() functions.
>>>
>>> Ben
>>>
>>>
>>>
>>> > I need to built 1000 matrices with 80 columns and unequal rows?
>>> >
>>> > Cublist$Diseases <- vector("list", 1)
>>> >
>>> > for (k in 1:1000){
>>> > for (i in 1:80){
>>> >
>>> > Cublist$Diseases[[gene[k] ]] <- Cubist$Expression[[Diseases[i] ]][k]
>>> > }
>>> >
>>> > }
>>> >
>>> > This double loops is time consuming...Is there a way to do this faster?
>>> >
>>> > Thanks,
>>> > karim
>>> >  Ô__
>>> > c/ /'_;~~~~kmezhoud
>>> > (*) \(*)   ⴽⴰⵔⵉⵎ  ⵎⴻⵣⵀⵓⴷ
>>> > http://bioinformatics.tn/
>>> >
>>> >       [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> <http://www.r-project.org/posting-guide.html>
>>> > and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>> Ben Tupper
>> Bigelow Laboratory for Ocean Sciences
>> 60 Bigelow Drive, P.O. Box 380
>> East Boothbay, Maine 04544
>> http://www.bigelow.org
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
> Ben Tupper
> Bigelow Laboratory for Ocean Sciences
> 60 Bigelow Drive, P.O. Box 380
> East Boothbay, Maine 04544
> http://www.bigelow.org
>
>
>
>
>
>
>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list