[BioC] GSEA and edgeR

Iain Gallagher iaingallagher at btopenworld.com
Thu Sep 1 17:21:38 CEST 2011


Hi Sean

Thanks for your reply.

I have already carried out a goseq analysis on my data. 

As I understand it goseq (like GOstats) looks for enrichment in a list of genes defined a priori as regulated and flags categories those genes belong to as enriched or not in that list of regulated genes.

I was more interested in testing 'sets' as a whole without defining a list of regulated genes up front.

I should have mentioned in my original message that I'd used goseq already since the two techniques are related. Aplologies for that.

Best

Iain



--- On Thu, 1/9/11, Sean Davis <sdavis2 at mail.nih.gov> wrote:

> From: Sean Davis <sdavis2 at mail.nih.gov>
> Subject: Re: [BioC] GSEA and edgeR
> To: "Iain Gallagher" <iaingallagher at btopenworld.com>
> Cc: "bioconductor" <bioconductor at stat.math.ethz.ch>, smyth at wehi.edu.au
> Date: Thursday, 1 September, 2011, 16:11
> Hi, Iain.
> 
> Before proceeding, you might want to take a look at:
> 
> http://bioconductor.org/packages/release/bioc/html/goseq.html
> 
> This paper and others like it describe the problem and
> another solution:
> 
> http://www.ncbi.nlm.nih.gov/pubmed/21252076
> 
> Sean
> 
> 
> On Thu, Sep 1, 2011 at 9:13 AM, Iain Gallagher
> <iaingallagher at btopenworld.com>
> wrote:
> > Dear List / Gordon
> >
> > I would like to carry out a GSEA analysis on some
> RNA-Seq data. I have analysed the data using edgeR and have
> a list of regulated genes. The data is paired - 6 biological
> samples, cells from each are infected or uninfected.
> >
> > Looking around there is little advice on the use of
> GSEA in RNA-Seq data. Using edgeR (and having a relatively
> small sample size) I was hoping to make use of the romer
> algorithm which is implemented in limma. However I am having
> a conceptual problem setting up my data for this.
> >
> > As the study uses paired sample data I have followed
> the guidance for this type of analysis in the marvellous
> edgeR manual. Searching for advice on how to conduct GSEA I
> came across this post (http://www.mail-archive.com/bioc-sig-sequencing@r-project.org/msg02020.html)
> by Gordon Smyth (author of edgeR / limma) wherein he
> describes a suitable strategy for edgeR and the rotational
> GSEA algorithms.
> >
> > These algorithms require the specification of a
> contrast (in my case between infected and uninfected
> animals).
> >
> > e.g. test <- romer(geneIndex, countData,
> design=design, contrast=contr[,1], nrot=9999)
> >
> > My design matrix looks like this:
> >
> > library(limma)
> >
> > samples <- rep(c('s1', 's2', 's3', 's4', 's5',
> 's6'),2)
> > infection <- c(rep(1,6), rep(2,6))
> >
> > s <- as.factor(samples)
> > i <- as.factor(infection)
> >
> > design <- model.matrix(~s+i)
> > colnames(design)[7] <- 'infection'
> >
> >> design
> >   (Intercept) ss2 ss3 ss4 ss5 ss6 infection
> > 1            1   0   0   0   0   0      
>   0
> > 2            1   1   0   0   0   0      
>   0
> > 3            1   0   1   0   0   0      
>   0
> > 4            1   0   0   1   0   0      
>   0
> > 5            1   0   0   0   1   0      
>   0
> > 6            1   0   0   0   0   1      
>   0
> > 7            1   0   0   0   0   0      
>   1
> > 8            1   1   0   0   0   0      
>   1
> > 9            1   0   1   0   0   0      
>   1
> > 10           1   0   0   1   0   0      
>   1
> > 11           1   0   0   0   1   0      
>   1
> > 12           1   0   0   0   0   1      
>   1
> > attr(,"assign")
> > [1] 0 1 1 1 1 1 2
> > attr(,"contrasts")
> > attr(,"contrasts")$s
> > [1] "contr.treatment"
> >
> > attr(,"contrasts")$i
> > [1] "contr.treatment"
> >
> > and in edgeR I carry out the following fit of a GLM to
> get the genes regulated between infected and uninfected
> animals i.e. fit a GLM for infection and no infection then
> edgeR carries out likeliehood tests to find the genes where
> the two models differ (I think!).
> >
> > #dispersion estimate
> > d <- estimateGLMCommonDisp(d, design)
> >
> > #fit the NB GLM for each gene
> > fitFiltered <- glmFit(d, design, dispersion =
> d$common.dispersion)
> >
> > #carry out the likliehood ratio test
> > lrtFiltered <- glmLRT(d, fitFiltered, coef = 7)
> >
> > #how many genes are DE?
> > summary(decideTestsDGE(lrtFiltered))#DE genes
> >
> > So (finally) my question is how do I set up a contrast
> (whilst keeping the pairing) for romer?
> >
> > Thanks
> >
> > Best
> >
> > Iain
> >
> >> sessionInfo()
> > R version 2.13.1 (2011-07-08)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> >
> > locale:
> >  [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C
> >  [3] LC_TIME=en_GB.utf8      
>  LC_COLLATE=en_GB.utf8
> >  [5] LC_MONETARY=C            
> LC_MESSAGES=en_GB.utf8
> >  [7] LC_PAPER=en_GB.utf8       LC_NAME=C
> >  [9] LC_ADDRESS=C              LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils    
> datasets  methods   base
> >
> > other attached packages:
> >  [1] annotate_1.30.0        GO.db_2.5.0      
>      goseq_1.4.0
> >  [4] geneLenDataBase_0.99.7 BiasedUrn_1.04      
>   org.Bt.eg.db_2.5.0
> >  [7] RSQLite_0.9-4          DBI_0.2-5      
>        AnnotationDbi_1.14.1
> > [10] Biobase_2.12.2         edgeR_2.2.5      
>      limma_3.8.3
> >
> > loaded via a namespace (and not attached):
> >  [1] biomaRt_2.8.1         Biostrings_2.20.3  
>   BSgenome_1.20.0
> >  [4] GenomicFeatures_1.4.4 GenomicRanges_1.4.8  
> grid_2.13.1
> >  [7] IRanges_1.10.6        lattice_0.19-33    
>   Matrix_0.9996875-3
> > [10] mgcv_1.7-6            nlme_3.1-102      
>    RCurl_1.6-9
> > [13] rtracklayer_1.12.4    tools_2.13.1        
>  XML_3.4-2
> > [16] xtable_1.5-6
> >>
> >
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>



More information about the Bioconductor mailing list