[BioC] edgeR GLM using factor that varies for each gene
Gordon K Smyth
smyth at wehi.EDU.AU
Fri May 9 06:12:48 CEST 2014
Dear Daniel,
I don't see any need for a gene-specific factor.
Simply give all the count rows (for all genes and all splicing events) to
edgeR. The design matrix is:
genotype <- factor(c("mutant1","mutant1","mutant2","mutant2","wt","wt"))
genotype <- relevel(genotype,ref="wt")
design <- model.matrix(~genotype)
If you want to find differentially abundant events between the mutants and
wt, you can run glmLRT() with coef=2 to examine mutant1, coef=3 to examine
mutant2, and contrast=c(0,0.5,0.5) to average the two mutant lines.
Best wishes
Gordon
> Date: Thu, 8 May 2014 00:33:05 -0700 (PDT)
> From: "Daniel Lang [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, daniel.lang at biologie.uni-freiburg.de
> Subject: [BioC] edgeR GLM using factor that varies for each gene
>
> Hi,
>
> after going over the user guide and searching this mailing list I'm not
> quite clear on how to best address my specific situation:
>
> I'd like to test differential "expression" of specific splicing events
> between a mutant and the wild type in a replicated design. To do so,
> I've specifically counted reads that are specific to a certain splicing
> event for each gene.
>
> e.g.
> event AS.type mutant.line1.rep1 mutant.line1.rep2 mutant.line2.rep1 mutant.line2.rep2 wt.rep1 wt.rep2
> S102-F_10.883 alt_donor 4 7 4 7 0 1
> S102-F_12.884 alt_donor 0 1 0 1 0 2
> S102-F_10.887 alt_donor 0 0 0 0 30 33
> S102-F_10.886 alt_acceptor 0 0 0 0 22 21
> S102-F_11.890 alt_donor 0 0 0 0 0 0
> S102-F_11.889 alt_acceptor 0 0 0 0 0 0
> S102-F_10.891 alt_acceptor 0 0 0 0 0 0
> S103-R_3.901 alt_acceptor 4 5 4 5 10 11
> S103-R_2.904 skipped_exon 2 4 2 4 33 28
> S103-R_2.902 alt_acceptor 4 5 4 5 0 0
> S103-R_1.906 alt_acceptor 0 1 0 1 1 0
>
> It's not clear from this example, but overall there is a difference
> between abundances and noise levels of specific types of alternative
> splicing I'd like to correct for, but also assess using GLM. Thus,
> ideally I'd like to find differentially abundant splicing events between
> the mutant and the wild type irrespective of line and biological
> replicate.
>
> As far as I understood the UserGuide and the ReferenceManual design
> always refers to factors for describing the libraries/experiments the
> counts are derived from.
>
> If I'd be using "normal" GLM, what I want to do would look like
> glm(count ~ AS.type + genotype + line + biological.replicate).
>
> Can I accomplish this with edgeR without splitting up the events into
> different data sets per splice type?
>
> Any advise on this would be greatly appreciated.
>
> Best,
> Daniel
>
> -- output of sessionInfo():
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
> [3] LC_TIME=de_DE.utf8 LC_COLLATE=en_US.utf8
> [5] LC_MONETARY=de_DE.utf8 LC_MESSAGES=en_US.utf8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list