[BioC] EdgeR questions about exactTest and plotMDS function
Gordon K Smyth
smyth at wehi.EDU.AU
Wed May 29 01:59:20 CEST 2013
Dear Amos,
This is a quick appendix to Mark's comments. The plotMDS() error is
occuring because your d$counts is a data.frame instead of a matrix. I
will add a check for this in the plotMDS code.
Had you used
d <- DGEList(counts=filin[,2:5])
(see Mark's comments) or
d$counts <- as.matrix(filin[,2:5])
then the error wouldn't have occured.
Also, please upgrade to the current version.
Best wishes
Gordon
> Date: Mon, 27 May 2013 21:09:44 +0200
> From: Mark Robinson <mark.robinson at imls.uzh.ch>
> To: Amos Kirilovsky <amos.kirilovsky at gmail.com>
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] EdgeR questions about exactTest and plotMDS
> function
>
> Hi Amos,
>
>> 1) *Differential expression*:
>>> et <- exactTest(d)
>> Error in `[.data.frame`(mu1, g) : undefined columns selected
>>
>> or (only if the name of my first group is starting with the letter A !)
>>> et <- exactTest(d)
>> Error in 0:s1[g] : argument NA / NaN
>>
>> I found out that my error message is due to the presence of only one sample
>> in my first group when using exactTest() function. I used glmLRT() function
>> and I obtained many significantly differentially expressed genes.
>> Could you confirm that exactTest() don't work with group of only one sample
>> and that I used correctly glmLRT() ? My code is at the end of the message.
>
> I can't reproduce this error on a toy problem (either on edgeR 3.0.8 or
> on the latest 3.2.3):
>
> library(edgeR)
> y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4)
> d <- DGEList(counts=y, group=rep(1:2,c(1,3)))
> d <- estimateCommonDisp(d, verbose=TRUE)
> d <- estimateTagwiseDisp(d, trend="none")
> et <- exactTest(d)
>
> I'm afraid I can't tell from your code what the issue is, but it is a
> bit unusual to construct the 'DGEList' object manually that way you do.
> Why not use the DGEList() constructor?
>
>
>> During my tests with the first Case study I get different results when
>> doing plotMDS(d) and plotMDS(d$counts).
>> The axis length and the sample position were different.
>> Is it expected? why such a difference? I guess it has to do with the
>> gene.selection option.
>
> This is expected. plotMDS(x) is different when x is a DGEList object
> from when x is a matrix. See ?plotMDS.DGEList for the former and
> ?plotMDS for the latter. You probably want a count-based MDS plot, i.e.
> plotMDS(x), whether that be in the method='logFC' or method='bcv'
> flavour. Read the docs for more details.
>
> Best, Mark
>
> ----------
> Prof. Dr. Mark Robinson
> Bioinformatics, Institute of Molecular Life Sciences
> University of Zurich
> http://tiny.cc/mrobin
>
>
>
> On 27.05.2013, at 17:36, Amos Kirilovsky <amos.kirilovsky at gmail.com> wrote:
>
>> Dear Bioconductor list,
>>
>> I'm try to get differentially expressed genes from two groups of samples
>> using EdgeR.
>> In the first group I have only 1 sample and in the second there are 3
>> biological replicates. (I will get more samples per group in the future but
>> for now I have to make an analysis pipeline with only that 4 samples)
>> The sample came from a metatranscriptomic experience.
>> For the comparison I selected only the genes that match to a species of
>> interest.
>>
>> I got error messages doing differential expression test and MDS :
>>
>> 1) *Differential expression*:
>>> et <- exactTest(d)
>> Error in `[.data.frame`(mu1, g) : undefined columns selected
>>
>> or (only if the name of my first group is starting with the letter A !)
>>> et <- exactTest(d)
>> Error in 0:s1[g] : argument NA / NaN
>>
>> I found out that my error message is due to the presence of only one sample
>> in my first group when using exactTest() function. I used glmLRT() function
>> and I obtained many significantly differentially expressed genes.
>> Could you confirm that exactTest() don't work with group of only one sample
>> and that I used correctly glmLRT() ? My code is at the end of the message.
>>
>>
>> 2) *MDS*
>>> plotMDS(d)
>> Error in is.finite(x$counts) :
>> default method unavailable for 'list' type (translated from French)
>>
>> For the second problem I made some tests with the first Case study ("SAGE
>> profiles of normal and tumour tissue") in edgeR user Guide.
>> My code was very similar and this time it worked fine. I couldn't find out
>> why. Do you have any idea?
>> I copy the output of my DGEList object after my message. I don't know if
>> this is helpfull but the function plotBCV() seems ok with my data.
>>
>> During my tests with the first Case study I get different results when
>> doing plotMDS(d) and plotMDS(d$counts).
>> The axis length and the sample position were different.
>> Is it expected? why such a difference? I guess it has to do with the
>> gene.selection option.
>>
>>
>> I hope I was clear in my explanations.
>> Many thanks in advance.
>>
>> Amos Kirilovsky
>> --
>> Dr Amos Kirilovsky
>> CEA
>> Institut de g?nomique
>> 2 rue Gaston Cremieux
>> CP 5706 91057 Evry cedex - France
>> Tel: (33) 01 60 87 25 35
>>
>> ####################################################################
>>> *d*
>> An object of class "DGEList"
>> $samples
>> files group lib.size norm.factors
>> s1 s1 E 271842 0.9414223
>> s2 s2 b 1310877 1.0216270
>> s3 s3 b 669731 1.0476370
>> s4 s4 b 1184754 0.9924584
>>
>> $counts
>> s1 s2 s3 s4
>> 804851 0 45 19 49
>> 587117 0 27 12 30
>> 427810 0 16 6 10
>> 367020 0 8 16 20
>> 980437 0 2 8 14
>> 14236 more rows ...
>>
>> $common.dispersion
>> [1] 0.4296784
>>
>> $pseudo.counts
>> s1 s2 s3 s4
>> 804851 0 24.4565653 19.756230 30.425129
>> 587117 0 14.6478020 12.480244 18.638575
>> 427810 0 8.7504091 6.242240 6.135612
>> 367020 0 4.1452733 16.616435 12.407833
>> 980437 0 0.8736513 8.306954 8.780709
>> 14236 more rows ...
>>
>> $logCPM
>> [1] 4.803775 4.158376 3.185720 3.723672 2.957607
>> 14236 more elements ...
>>
>> $pseudo.lib.size
>> [1] 729208.6
>>
>> $prior.n
>> [1] 10
>>
>> $tagwise.dispersion
>> [1] 0.3771734 0.3789361 0.3849697 0.4188742 0.4511627
>> 14236 more elements
>>
>>
>> ####################################################################
>> ####################################################################
>>
>> *#script*
>> library(edgeR)
>>
>> #read file
>> filinName = "metaTranscriptome"
>> filin = read.table(filinName, sep ="\t", header = TRUE)
>>
>> x <- list()
>> sets <- c("s1","s2","s3","s4")
>> x$samples <- data.frame(files=as.character(sets),stringsAsFactors=FALSE)
>> x$samples$group <- factor(c("A","B","B","B"))
>>
>> # read count matrix
>> x$counts = filin[,2:5] #Attention s'il y a des NA ?a va bugger
>> rownames(x$counts) <- filin[,1]
>> colnames(x$counts) <- sets
>>
>> # tot read number by sample
>> x$samples$lib.size <- colSums(x$counts)
>>
>> # DGEList
>> x$samples$norm.factors <- 1
>> row.names(x$samples) <- colnames(x$counts)
>> d <- new("DGEList",x) #creation de DGElist ? partir de x
>>
>> # normalisation factor
>> d <- calcNormFactors(d) # TMM
>>
>> summary(d$counts)
>>
>> ################ Not working ###################
>> plotMDS(d)
>> ################################################
>>
>> # estimate common dispersion
>> d <- estimateCommonDisp(d, verbose=TRUE)
>>
>> # estimate Tagwise Disp
>> d <- estimateTagwiseDisp(d, trend="none")
>>
>> # plots the tagwise dispersions against log2-CPM
>> # plotBCV(d, cex=0.4)
>>
>> ################ Not working ###################
>> # Differential expression
>> # et <- exactTest(d)
>> # topTags(et)
>> ################################################
>>
>> # matrix design
>> design <- model.matrix(~0+group, data=d$samples)
>> colnames(design) <- levels(d$samples$group)
>> design
>>
>> # Differential expression
>> fit <- glmFit(d,design)
>> lrt <- glmLRT(fit, contrast=c(1,-1))
>> topTags(lrt)
>>
>> ####################################################################
>> ####################################################################
>>
>> sessionInfo()
>>
>> R version 2.15.3 (2013-03-01)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> other attached packages:
>> [1] edgeR_3.0.8 limma_3.14.4
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list