[BioC] EdgeR questions about exactTest and plotMDS function
Mark Robinson
mark.robinson at imls.uzh.ch
Mon May 27 21:09:44 CEST 2013
Hi Amos,
> 1) *Differential expression*:
>> et <- exactTest(d)
> Error in `[.data.frame`(mu1, g) : undefined columns selected
>
> or (only if the name of my first group is starting with the letter A !)
>> et <- exactTest(d)
> Error in 0:s1[g] : argument NA / NaN
>
> I found out that my error message is due to the presence of only one sample
> in my first group when using exactTest() function. I used glmLRT() function
> and I obtained many significantly differentially expressed genes.
> Could you confirm that exactTest() don't work with group of only one sample
> and that I used correctly glmLRT() ? My code is at the end of the message.
I can't reproduce this error on a toy problem (either on edgeR 3.0.8 or on the latest 3.2.3):
library(edgeR)
y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4)
d <- DGEList(counts=y, group=rep(1:2,c(1,3)))
d <- estimateCommonDisp(d, verbose=TRUE)
d <- estimateTagwiseDisp(d, trend="none")
et <- exactTest(d)
I'm afraid I can't tell from your code what the issue is, but it is a bit unusual to construct the 'DGEList' object manually that way you do. Why not use the DGEList() constructor?
> During my tests with the first Case study I get different results when
> doing plotMDS(d) and plotMDS(d$counts).
> The axis length and the sample position were different.
> Is it expected? why such a difference? I guess it has to do with the
> gene.selection option.
This is expected. plotMDS(x) is different when x is a DGEList object from when x is a matrix. See ?plotMDS.DGEList for the former and ?plotMDS for the latter. You probably want a count-based MDS plot, i.e. plotMDS(x), whether that be in the method='logFC' or method='bcv' flavour. Read the docs for more details.
Best, Mark
----------
Prof. Dr. Mark Robinson
Bioinformatics, Institute of Molecular Life Sciences
University of Zurich
http://tiny.cc/mrobin
On 27.05.2013, at 17:36, Amos Kirilovsky <amos.kirilovsky at gmail.com> wrote:
> Dear Bioconductor list,
>
> I'm try to get differentially expressed genes from two groups of samples
> using EdgeR.
> In the first group I have only 1 sample and in the second there are 3
> biological replicates. (I will get more samples per group in the future but
> for now I have to make an analysis pipeline with only that 4 samples)
> The sample came from a metatranscriptomic experience.
> For the comparison I selected only the genes that match to a species of
> interest.
>
> I got error messages doing differential expression test and MDS :
>
> 1) *Differential expression*:
>> et <- exactTest(d)
> Error in `[.data.frame`(mu1, g) : undefined columns selected
>
> or (only if the name of my first group is starting with the letter A !)
>> et <- exactTest(d)
> Error in 0:s1[g] : argument NA / NaN
>
> I found out that my error message is due to the presence of only one sample
> in my first group when using exactTest() function. I used glmLRT() function
> and I obtained many significantly differentially expressed genes.
> Could you confirm that exactTest() don't work with group of only one sample
> and that I used correctly glmLRT() ? My code is at the end of the message.
>
>
> 2) *MDS*
>> plotMDS(d)
> Error in is.finite(x$counts) :
> default method unavailable for 'list' type (translated from French)
>
> For the second problem I made some tests with the first Case study ("SAGE
> profiles of normal and tumour tissue") in edgeR user Guide.
> My code was very similar and this time it worked fine. I couldn't find out
> why. Do you have any idea?
> I copy the output of my DGEList object after my message. I don't know if
> this is helpfull but the function plotBCV() seems ok with my data.
>
> During my tests with the first Case study I get different results when
> doing plotMDS(d) and plotMDS(d$counts).
> The axis length and the sample position were different.
> Is it expected? why such a difference? I guess it has to do with the
> gene.selection option.
>
>
> I hope I was clear in my explanations.
> Many thanks in advance.
>
> Amos Kirilovsky
> --
> Dr Amos Kirilovsky
> CEA
> Institut de génomique
> 2 rue Gaston Cremieux
> CP 5706 91057 Evry cedex - France
> Tel: (33) 01 60 87 25 35
>
> ####################################################################
>> *d*
> An object of class "DGEList"
> $samples
> files group lib.size norm.factors
> s1 s1 E 271842 0.9414223
> s2 s2 b 1310877 1.0216270
> s3 s3 b 669731 1.0476370
> s4 s4 b 1184754 0.9924584
>
> $counts
> s1 s2 s3 s4
> 804851 0 45 19 49
> 587117 0 27 12 30
> 427810 0 16 6 10
> 367020 0 8 16 20
> 980437 0 2 8 14
> 14236 more rows ...
>
> $common.dispersion
> [1] 0.4296784
>
> $pseudo.counts
> s1 s2 s3 s4
> 804851 0 24.4565653 19.756230 30.425129
> 587117 0 14.6478020 12.480244 18.638575
> 427810 0 8.7504091 6.242240 6.135612
> 367020 0 4.1452733 16.616435 12.407833
> 980437 0 0.8736513 8.306954 8.780709
> 14236 more rows ...
>
> $logCPM
> [1] 4.803775 4.158376 3.185720 3.723672 2.957607
> 14236 more elements ...
>
> $pseudo.lib.size
> [1] 729208.6
>
> $prior.n
> [1] 10
>
> $tagwise.dispersion
> [1] 0.3771734 0.3789361 0.3849697 0.4188742 0.4511627
> 14236 more elements
>
>
> ####################################################################
> ####################################################################
>
> *#script*
> library(edgeR)
>
> #read file
> filinName = "metaTranscriptome"
> filin = read.table(filinName, sep ="\t", header = TRUE)
>
> x <- list()
> sets <- c("s1","s2","s3","s4")
> x$samples <- data.frame(files=as.character(sets),stringsAsFactors=FALSE)
> x$samples$group <- factor(c("A","B","B","B"))
>
> # read count matrix
> x$counts = filin[,2:5] #Attention s'il y a des NA ça va bugger
> rownames(x$counts) <- filin[,1]
> colnames(x$counts) <- sets
>
> # tot read number by sample
> x$samples$lib.size <- colSums(x$counts)
>
> # DGEList
> x$samples$norm.factors <- 1
> row.names(x$samples) <- colnames(x$counts)
> d <- new("DGEList",x) #creation de DGElist à partir de x
>
> # normalisation factor
> d <- calcNormFactors(d) # TMM
>
> summary(d$counts)
>
> ################ Not working ###################
> plotMDS(d)
> ################################################
>
> # estimate common dispersion
> d <- estimateCommonDisp(d, verbose=TRUE)
>
> # estimate Tagwise Disp
> d <- estimateTagwiseDisp(d, trend="none")
>
> # plots the tagwise dispersions against log2-CPM
> # plotBCV(d, cex=0.4)
>
> ################ Not working ###################
> # Differential expression
> # et <- exactTest(d)
> # topTags(et)
> ################################################
>
> # matrix design
> design <- model.matrix(~0+group, data=d$samples)
> colnames(design) <- levels(d$samples$group)
> design
>
> # Differential expression
> fit <- glmFit(d,design)
> lrt <- glmLRT(fit, contrast=c(1,-1))
> topTags(lrt)
>
> ####################################################################
> ####################################################################
>
> sessionInfo()
>
> R version 2.15.3 (2013-03-01)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> other attached packages:
> [1] edgeR_3.0.8 limma_3.14.4
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list