[BioC] GSEA and edgeR
Iain Gallagher
iaingallagher at btopenworld.com
Thu Sep 1 17:21:38 CEST 2011
Hi Sean
Thanks for your reply.
I have already carried out a goseq analysis on my data.
As I understand it goseq (like GOstats) looks for enrichment in a list of genes defined a priori as regulated and flags categories those genes belong to as enriched or not in that list of regulated genes.
I was more interested in testing 'sets' as a whole without defining a list of regulated genes up front.
I should have mentioned in my original message that I'd used goseq already since the two techniques are related. Aplologies for that.
Best
Iain
--- On Thu, 1/9/11, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> From: Sean Davis <sdavis2 at mail.nih.gov>
> Subject: Re: [BioC] GSEA and edgeR
> To: "Iain Gallagher" <iaingallagher at btopenworld.com>
> Cc: "bioconductor" <bioconductor at stat.math.ethz.ch>, smyth at wehi.edu.au
> Date: Thursday, 1 September, 2011, 16:11
> Hi, Iain.
>
> Before proceeding, you might want to take a look at:
>
> http://bioconductor.org/packages/release/bioc/html/goseq.html
>
> This paper and others like it describe the problem and
> another solution:
>
> http://www.ncbi.nlm.nih.gov/pubmed/21252076
>
> Sean
>
>
> On Thu, Sep 1, 2011 at 9:13 AM, Iain Gallagher
> <iaingallagher at btopenworld.com>
> wrote:
> > Dear List / Gordon
> >
> > I would like to carry out a GSEA analysis on some
> RNA-Seq data. I have analysed the data using edgeR and have
> a list of regulated genes. The data is paired - 6 biological
> samples, cells from each are infected or uninfected.
> >
> > Looking around there is little advice on the use of
> GSEA in RNA-Seq data. Using edgeR (and having a relatively
> small sample size) I was hoping to make use of the romer
> algorithm which is implemented in limma. However I am having
> a conceptual problem setting up my data for this.
> >
> > As the study uses paired sample data I have followed
> the guidance for this type of analysis in the marvellous
> edgeR manual. Searching for advice on how to conduct GSEA I
> came across this post (http://www.mail-archive.com/bioc-sig-sequencing@r-project.org/msg02020.html)
> by Gordon Smyth (author of edgeR / limma) wherein he
> describes a suitable strategy for edgeR and the rotational
> GSEA algorithms.
> >
> > These algorithms require the specification of a
> contrast (in my case between infected and uninfected
> animals).
> >
> > e.g. test <- romer(geneIndex, countData,
> design=design, contrast=contr[,1], nrot=9999)
> >
> > My design matrix looks like this:
> >
> > library(limma)
> >
> > samples <- rep(c('s1', 's2', 's3', 's4', 's5',
> 's6'),2)
> > infection <- c(rep(1,6), rep(2,6))
> >
> > s <- as.factor(samples)
> > i <- as.factor(infection)
> >
> > design <- model.matrix(~s+i)
> > colnames(design)[7] <- 'infection'
> >
> >> design
> > (Intercept) ss2 ss3 ss4 ss5 ss6 infection
> > 1 1 0 0 0 0 0
> 0
> > 2 1 1 0 0 0 0
> 0
> > 3 1 0 1 0 0 0
> 0
> > 4 1 0 0 1 0 0
> 0
> > 5 1 0 0 0 1 0
> 0
> > 6 1 0 0 0 0 1
> 0
> > 7 1 0 0 0 0 0
> 1
> > 8 1 1 0 0 0 0
> 1
> > 9 1 0 1 0 0 0
> 1
> > 10 1 0 0 1 0 0
> 1
> > 11 1 0 0 0 1 0
> 1
> > 12 1 0 0 0 0 1
> 1
> > attr(,"assign")
> > [1] 0 1 1 1 1 1 2
> > attr(,"contrasts")
> > attr(,"contrasts")$s
> > [1] "contr.treatment"
> >
> > attr(,"contrasts")$i
> > [1] "contr.treatment"
> >
> > and in edgeR I carry out the following fit of a GLM to
> get the genes regulated between infected and uninfected
> animals i.e. fit a GLM for infection and no infection then
> edgeR carries out likeliehood tests to find the genes where
> the two models differ (I think!).
> >
> > #dispersion estimate
> > d <- estimateGLMCommonDisp(d, design)
> >
> > #fit the NB GLM for each gene
> > fitFiltered <- glmFit(d, design, dispersion =
> d$common.dispersion)
> >
> > #carry out the likliehood ratio test
> > lrtFiltered <- glmLRT(d, fitFiltered, coef = 7)
> >
> > #how many genes are DE?
> > summary(decideTestsDGE(lrtFiltered))#DE genes
> >
> > So (finally) my question is how do I set up a contrast
> (whilst keeping the pairing) for romer?
> >
> > Thanks
> >
> > Best
> >
> > Iain
> >
> >> sessionInfo()
> > R version 2.13.1 (2011-07-08)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> >
> > locale:
> > [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
> > [3] LC_TIME=en_GB.utf8
> LC_COLLATE=en_GB.utf8
> > [5] LC_MONETARY=C
> LC_MESSAGES=en_GB.utf8
> > [7] LC_PAPER=en_GB.utf8 LC_NAME=C
> > [9] LC_ADDRESS=C LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats graphics grDevices utils
> datasets methods base
> >
> > other attached packages:
> > [1] annotate_1.30.0 GO.db_2.5.0
> goseq_1.4.0
> > [4] geneLenDataBase_0.99.7 BiasedUrn_1.04
> org.Bt.eg.db_2.5.0
> > [7] RSQLite_0.9-4 DBI_0.2-5
> AnnotationDbi_1.14.1
> > [10] Biobase_2.12.2 edgeR_2.2.5
> limma_3.8.3
> >
> > loaded via a namespace (and not attached):
> > [1] biomaRt_2.8.1 Biostrings_2.20.3
> BSgenome_1.20.0
> > [4] GenomicFeatures_1.4.4 GenomicRanges_1.4.8
> grid_2.13.1
> > [7] IRanges_1.10.6 lattice_0.19-33
> Matrix_0.9996875-3
> > [10] mgcv_1.7-6 nlme_3.1-102
> RCurl_1.6-9
> > [13] rtracklayer_1.12.4 tools_2.13.1
> XML_3.4-2
> > [16] xtable_1.5-6
> >>
> >
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
More information about the Bioconductor
mailing list