[BioC] edgeR GLM using factor that varies for each gene

Fri May 9 06:12:48 CEST 2014

Dear Daniel,

I don't see any need for a gene-specific factor.

Simply give all the count rows (for all genes and all splicing events) to 
edgeR.  The design matrix is:

   genotype <- factor(c("mutant1","mutant1","mutant2","mutant2","wt","wt"))
   genotype <- relevel(genotype,ref="wt")
   design <- model.matrix(~genotype)

If you want to find differentially abundant events between the mutants and 
wt, you can run glmLRT() with coef=2 to examine mutant1, coef=3 to examine 
mutant2, and contrast=c(0,0.5,0.5) to average the two mutant lines.

Best wishes
Gordon

> Date: Thu,  8 May 2014 00:33:05 -0700 (PDT)
> From: "Daniel Lang [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, daniel.lang at biologie.uni-freiburg.de
> Subject: [BioC] edgeR GLM using factor that varies for each gene
>
> Hi,
>
> after going over the user guide and searching this mailing list I'm not 
> quite clear on how to best address my specific situation:
>
> I'd like to test differential "expression" of specific splicing events 
> between a mutant and the wild type in a replicated design. To do so, 
> I've specifically counted reads that are specific to a certain splicing 
> event for each gene.
>
> e.g.
> event	AS.type	mutant.line1.rep1	mutant.line1.rep2	mutant.line2.rep1	mutant.line2.rep2	wt.rep1	wt.rep2
> S102-F_10.883	alt_donor	4	7	4	7	0	1
> S102-F_12.884	alt_donor	0	1	0	1	0	2
> S102-F_10.887	alt_donor	0	0	0	0	30	33
> S102-F_10.886	alt_acceptor	0	0	0	0	22	21
> S102-F_11.890	alt_donor	0	0	0	0	0	0
> S102-F_11.889	alt_acceptor	0	0	0	0	0	0
> S102-F_10.891	alt_acceptor	0	0	0	0	0	0
> S103-R_3.901	alt_acceptor	4	5	4	5	10	11
> S103-R_2.904	skipped_exon	2	4	2	4	33	28
> S103-R_2.902	alt_acceptor	4	5	4	5	0	0
> S103-R_1.906	alt_acceptor	0	1	0	1	1	0
>
> It's not clear from this example, but overall there is a difference 
> between abundances and noise levels of specific types of alternative 
> splicing I'd like to correct for, but also assess using GLM. Thus, 
> ideally I'd like to find differentially abundant splicing events between 
> the mutant and the wild type irrespective of line and biological 
> replicate.
>
> As far as I understood the UserGuide and the ReferenceManual design 
> always refers to factors for describing the libraries/experiments the 
> counts are derived from.
>
> If I'd be using "normal" GLM, what I want to do would look like 
> glm(count ~ AS.type + genotype + line + biological.replicate).
>
> Can I accomplish this with edgeR without splitting up the events into 
> different data sets per splice type?
>
> Any advise on this would be greatly appreciated.
>
> Best,
> Daniel
>
> -- output of sessionInfo():
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
> [3] LC_TIME=de_DE.utf8        LC_COLLATE=en_US.utf8
> [5] LC_MONETARY=de_DE.utf8    LC_MESSAGES=en_US.utf8
> [7] LC_PAPER=C                LC_NAME=C
> [9] LC_ADDRESS=C              LC_TELEPHONE=C
> [11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}