[BioC] EdgeR questions about exactTest and plotMDS function

Mon May 27 21:09:44 CEST 2013

Hi Amos,

> 1) *Differential expression*:
>> et <- exactTest(d)
> Error in `[.data.frame`(mu1, g) : undefined columns selected
> 
> or  (only if the name of my first group is starting with the letter A !)
>> et <- exactTest(d)
> Error in 0:s1[g] : argument NA / NaN
> 
> I found out that my error message is due to the presence of only one sample
> in my first group when using exactTest() function. I used glmLRT() function
> and I obtained many significantly differentially expressed genes.
> Could you confirm that exactTest() don't work with group of only one sample
> and that I used correctly glmLRT() ?  My code is at the end of the message.

I can't reproduce this error on a toy problem (either on edgeR 3.0.8 or on the latest 3.2.3):

library(edgeR)
y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4)
d <- DGEList(counts=y, group=rep(1:2,c(1,3)))
d <- estimateCommonDisp(d, verbose=TRUE)
d <- estimateTagwiseDisp(d, trend="none")
et <- exactTest(d)

I'm afraid I can't tell from your code what the issue is, but it is a bit unusual to construct the 'DGEList' object manually that way you do.  Why not use the DGEList() constructor?

> During my tests with the first Case study I get different results when
> doing plotMDS(d) and plotMDS(d$counts).
> The axis length and the sample position were different.
> Is it expected? why such a difference?  I guess it has to do with the
> gene.selection option.

This is expected.  plotMDS(x) is different when x is a DGEList object from when x is a matrix.  See ?plotMDS.DGEList for the former and ?plotMDS for the latter.  You probably want a count-based MDS plot, i.e. plotMDS(x), whether that be in the method='logFC' or method='bcv' flavour.  Read the docs for more details.

Best, Mark

----------
Prof. Dr. Mark Robinson
Bioinformatics, Institute of Molecular Life Sciences
University of Zurich
http://tiny.cc/mrobin

On 27.05.2013, at 17:36, Amos Kirilovsky <amos.kirilovsky at gmail.com> wrote:

> Dear Bioconductor list,
> 
> I'm try to get differentially expressed genes from two groups of samples
> using EdgeR.
> In the first group I have only 1 sample and in the second there are 3
> biological replicates. (I will get more samples per group in the future but
> for now I have to make an analysis pipeline with only that 4 samples)
> The sample came from a metatranscriptomic experience.
> For the comparison I selected only the genes that match to a species of
> interest.
> 
> I got error messages doing differential expression test and MDS  :
> 
> 1) *Differential expression*:
>> et <- exactTest(d)
> Error in `[.data.frame`(mu1, g) : undefined columns selected
> 
> or  (only if the name of my first group is starting with the letter A !)
>> et <- exactTest(d)
> Error in 0:s1[g] : argument NA / NaN
> 
> I found out that my error message is due to the presence of only one sample
> in my first group when using exactTest() function. I used glmLRT() function
> and I obtained many significantly differentially expressed genes.
> Could you confirm that exactTest() don't work with group of only one sample
> and that I used correctly glmLRT() ?  My code is at the end of the message.
> 
> 
> 2) *MDS*
>> plotMDS(d)
> Error in is.finite(x$counts) :
>  default method unavailable for 'list' type (translated from French)
> 
> For the second problem I made some tests with the first Case study  ("SAGE
> profiles of normal and tumour tissue") in edgeR user Guide.
> My code was very similar and this time it worked fine. I couldn't find out
> why. Do you have any idea?
> I copy the output of my DGEList object after my message. I don't know if
> this is helpfull but the function plotBCV()  seems ok with my data.
> 
> During my tests with the first Case study I get different results when
> doing plotMDS(d) and plotMDS(d$counts).
> The axis length and the sample position were different.
> Is it expected? why such a difference?  I guess it has to do with the
> gene.selection option.
> 
> 
> I hope I was clear in my explanations.
> Many thanks in advance.
> 
> Amos Kirilovsky
> -- 
> Dr Amos Kirilovsky
> CEA
> Institut de génomique
> 2 rue Gaston Cremieux
> CP 5706 91057 Evry cedex - France
> Tel: (33) 01 60 87 25 35
> 
> ####################################################################
>> *d*
> An object of class "DGEList"
> $samples
>   files group lib.size norm.factors
> s1    s1     E   271842    0.9414223
> s2    s2     b  1310877    1.0216270
> s3    s3     b   669731    1.0476370
> s4    s4     b  1184754    0.9924584
> 
> $counts
>       s1 s2 s3 s4
> 804851  0 45 19 49
> 587117  0 27 12 30
> 427810  0 16  6 10
> 367020  0  8 16 20
> 980437  0  2  8 14
> 14236 more rows ...
> 
> $common.dispersion
> [1] 0.4296784
> 
> $pseudo.counts
>       s1         s2        s3        s4
> 804851  0 24.4565653 19.756230 30.425129
> 587117  0 14.6478020 12.480244 18.638575
> 427810  0  8.7504091  6.242240  6.135612
> 367020  0  4.1452733 16.616435 12.407833
> 980437  0  0.8736513  8.306954  8.780709
> 14236 more rows ...
> 
> $logCPM
> [1] 4.803775 4.158376 3.185720 3.723672 2.957607
> 14236 more elements ...
> 
> $pseudo.lib.size
> [1] 729208.6
> 
> $prior.n
> [1] 10
> 
> $tagwise.dispersion
> [1] 0.3771734 0.3789361 0.3849697 0.4188742 0.4511627
> 14236 more elements
> 
> 
> ####################################################################
> ####################################################################
> 
> *#script*
> library(edgeR)
> 
> #read file
> filinName = "metaTranscriptome"
> filin = read.table(filinName, sep ="\t", header = TRUE)
> 
> x <- list()
> sets <- c("s1","s2","s3","s4")
> x$samples <- data.frame(files=as.character(sets),stringsAsFactors=FALSE)
> x$samples$group <- factor(c("A","B","B","B"))
> 
> # read count matrix
> x$counts = filin[,2:5] #Attention s'il y a des NA ça va bugger
> rownames(x$counts) <- filin[,1]
> colnames(x$counts) <- sets
> 
> # tot read number by sample
> x$samples$lib.size <- colSums(x$counts)
> 
> # DGEList
> x$samples$norm.factors <- 1
> row.names(x$samples) <- colnames(x$counts)
> d <- new("DGEList",x) #creation de DGElist à partir de x
> 
> # normalisation factor
> d <- calcNormFactors(d) # TMM
> 
> summary(d$counts)
> 
> ################ Not working ###################
> plotMDS(d)
> ################################################
> 
> # estimate common dispersion
> d <- estimateCommonDisp(d, verbose=TRUE)
> 
> # estimate Tagwise Disp
> d <- estimateTagwiseDisp(d, trend="none")
> 
> # plots the tagwise dispersions against log2-CPM
> # plotBCV(d, cex=0.4)
> 
> ################ Not working ###################
> # Differential expression
> # et <- exactTest(d)
> # topTags(et)
> ################################################
> 
> # matrix design
> design <- model.matrix(~0+group, data=d$samples)
> colnames(design) <- levels(d$samples$group)
> design
> 
> # Differential expression
> fit <- glmFit(d,design)
> lrt <- glmLRT(fit, contrast=c(1,-1))
> topTags(lrt)
> 
> ####################################################################
> ####################################################################
> 
> sessionInfo()
> 
> R version 2.15.3 (2013-03-01)
> Platform: x86_64-pc-linux-gnu (64-bit)
> 
> other attached packages:
> [1] edgeR_3.0.8  limma_3.14.4
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor