[BioC] LIMMA: plotMDS

Wu, Di dwu at fas.harvard.edu
Mon May 7 20:11:33 CEST 2012


Hi Kripa,

It is a good question to ask which the top 500 genes are. We only can know that if "gene.selection" is "common", in which situation, the same top 500 genes were used for calculating the distance matrix among all samples. When "gene.selection" is "pairwise", different sets of 500 genes for different pairs of samples are used for distance matrix calculation.

If you look at "plotMDS.default" in limma, you will see the above description is consistent with the code. "plotMDS.default" is used by plotMDS.

The current code does not output the top-500-gene list. However, the code can be edited to output the list when  "gene.selection" is "common". Please see the following.
In the followed example, "mds4[[2]]" is the vector of indexes for the top 500 genes.

Hope this help.

Di


 
 plotMDS.default<-
function (x, top = 500, labels = colnames(x), col = NULL, cex = 1, 
    dim.plot = c(1, 2), ndim = max(dim.plot), gene.selection = "pairwise", 
    xlab = paste("Dimension", dim.plot[1]), ylab = paste("Dimension", 
        dim.plot[2]), ...) 
{
    x <- as.matrix(x)
    ok <- is.finite(x)
    if (!all(ok)) 
        x <- x[apply(ok, 1, all), ]
    if (is.null(labels)) 
        labels <- 1:dim(x)[2]
    nprobes <- nrow(x)
    nsamples <- ncol(x)
    if (ndim < 2) 
        stop("Need at least two dim.plot")
    if (nsamples < ndim) 
        stop("Two few samples")
    gene.selection <- match.arg(gene.selection, c("pairwise", 
        "common"))
    cn <- colnames(x)
    dd <- matrix(0, nrow = nsamples, ncol = nsamples, dimnames = list(cn, 
        cn))
    topindex <- nprobes - top + 1
    if (gene.selection == "pairwise") {
        for (i in 2:(nsamples)) for (j in 1:(i - 1)) dd[i, j] = sqrt(mean(sort.int((x[, 
            i] - x[, j])^2, partial = topindex)[topindex:nprobes]))
    }
    else {
    
    #		Same genes used for all comparisons ,"common"
   
        s <- rowMeans((x - rowMeans(x))^2)
        q <- quantile(s, p = (topindex - 1.5)/(nprobes - 1))

        x <- x[s >= q, ]
	
	#	 an extra line
        ind.top.genes<-which(s >= q)
        
        for (i in 2:(nsamples)) dd[i, 1:(i - 1)] = sqrt(colMeans((x[, 
            i] - x[, 1:(i - 1), drop = FALSE])^2))
    }
    a1 <- cmdscale(as.dist(dd), k = ndim)
    mds <- new("MDS", list(dim.plot = dim.plot, distance.matrix = dd, 
        cmdscale.out = a1, top = top, gene.selection = gene.selection))
    mds$x <- a1[, dim.plot[1]]
    mds$y <- a1[, dim.plot[2]]
     mdsPlot<-plotMDS(mds, labels = labels, col = col, cex = cex, xlab = xlab, 
        ylab = ylab, ...)
list       (mds=mds,  ind.top.genes=ind.top.genes)  
}


# The example from "plotMDS" documentation

sd <- 0.3*sqrt(4/rchisq(1000,df=4))
x <- matrix(rnorm(1000*6,sd=sd),1000,6)
rownames(x) <- paste("Gene",1:1000)
x[1:50,4:6] <- x[1:50,4:6] + 2
# without labels, indexes of samples are plotted.

mds <- plotMDS(x,  col=c(rep("black",3), rep("red",3)) )
mds4<-plotMDS.default(x,  col=c(rep("black",3), rep("red",3)) , gene.selection="common")

> mds4[[1]]
> length(mds4[[2]])
[1] 500






----
Di Wu
Postdoctoral fellow
Harvard University, Statistics Department
Harvard Medical School
Science Center, 1 Oxford Street, Cambridge, MA 02138-2901 USA

________________________________________
From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] On Behalf Of Tim Triche, Jr. [tim.triche at gmail.com]
Sent: Monday, May 07, 2012 1:48 PM
To: Kripa R
Cc: amackey at virginia.edu; bioconductor at r-project.org
Subject: Re: [BioC] LIMMA: plotMDS

also http://www.bioconductor.org/packages/2.11/bioc/vignettes/pcaMethods/inst/doc/pcaMethods.pdf<http://www.bioconductor.org/packages/2.11/bioc/vignettes/pcaMethods/inst/doc/pcaMethods.pdf>

if you are willing to simply truncate loadings at some arbitrary value (not
always the worst idea of all time mind you)

superPC and the preconditioned lasso turn this strategy into a method for
obtaining sparse classifiers, FWIW.


On Mon, May 7, 2012 at 10:46 AM, Tim Triche, Jr. <tim.triche at gmail.com>wrote:

> try using the 'sparcl' package or something like this:
> http://www.biomedcentral.com/1471-2105/11/296
>
> the point is to threshold vigorously enough that most of the loadings are
> forced to zero
>
>
>
> On Mon, May 7, 2012 at 10:16 AM, Kripa R <kripa777 at hotmail.com> wrote:
>
>>
>> Maybe this is very basic, but how do you identify those 500 genes for the
>> heatmap?
>>
>> .kripa
>>
>> From: amackey at virginia.edu
>> Date: Mon, 7 May 2012 11:47:16 -0400
>> Subject: Re: [BioC] LIMMA: plotMDS
>> To: kripa777 at hotmail.com
>> CC: bioconductor at r-project.org
>>
>> All genes contribute to all principle components; some just more than
>> others.  I've found it useful (or perhaps just entertaining) to see
>> heatmaps of the first N (say 10) component loadings, constrained to the 500
>> genes whose variances contribute most to those components.
>>
>>
>> -Aaron
>>
>> On Sun, May 6, 2012 at 7:21 PM, Kripa R <kripa777 at hotmail.com> wrote:
>>
>>
>>
>>
>> Hi I was wondering how I can figure out which genes correspond to each of
>> the principle components ($cmdscale.out). Gene.selection is set to pairwise
>> for the top 500 but I'd like to know exactly which genes are being
>> considered for cmdscale.out[,3], for example.
>>
>>
>>
>>
>>
>> Thanks for the help!
>>
>>
>>
>> .kripa
>>
>>        [[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>>
>> Bioconductor mailing list
>>
>> Bioconductor at r-project.org
>>
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
> --
> *A model is a lie that helps you see the truth.*
> *
> *
> Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>
>


--
*A model is a lie that helps you see the truth.*
*
*
Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>

        [[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list