[BioC] edgeR GLM using factor that varies for each gene
Daniel Lang
daniel.lang at biologie.uni-freiburg.de
Fri May 9 11:37:55 CEST 2014
Dear Gordon,
thank you so much for your prompt and helpful answer.
You're right I was thinking too complicated:-)
Best,
Daniel
On 09.05.2014 06:12, Gordon K Smyth wrote:
> Dear Daniel,
>
> I don't see any need for a gene-specific factor.
>
> Simply give all the count rows (for all genes and all splicing events)
> to edgeR. The design matrix is:
>
> genotype <- factor(c("mutant1","mutant1","mutant2","mutant2","wt","wt"))
> genotype <- relevel(genotype,ref="wt")
> design <- model.matrix(~genotype)
>
> If you want to find differentially abundant events between the mutants
> and wt, you can run glmLRT() with coef=2 to examine mutant1, coef=3 to
> examine mutant2, and contrast=c(0,0.5,0.5) to average the two mutant lines.
>
> Best wishes
> Gordon
>
>
>> Date: Thu, 8 May 2014 00:33:05 -0700 (PDT)
>> From: "Daniel Lang [guest]" <guest at bioconductor.org>
>> To: bioconductor at r-project.org, daniel.lang at biologie.uni-freiburg.de
>> Subject: [BioC] edgeR GLM using factor that varies for each gene
>>
>> Hi,
>>
>> after going over the user guide and searching this mailing list I'm
>> not quite clear on how to best address my specific situation:
>>
>> I'd like to test differential "expression" of specific splicing events
>> between a mutant and the wild type in a replicated design. To do so,
>> I've specifically counted reads that are specific to a certain
>> splicing event for each gene.
>>
>> e.g.
>> event AS.type mutant.line1.rep1 mutant.line1.rep2
>> mutant.line2.rep1 mutant.line2.rep2 wt.rep1 wt.rep2
>> S102-F_10.883 alt_donor 4 7 4 7 0 1
>> S102-F_12.884 alt_donor 0 1 0 1 0 2
>> S102-F_10.887 alt_donor 0 0 0 0 30 33
>> S102-F_10.886 alt_acceptor 0 0 0 0 22 21
>> S102-F_11.890 alt_donor 0 0 0 0 0 0
>> S102-F_11.889 alt_acceptor 0 0 0 0 0 0
>> S102-F_10.891 alt_acceptor 0 0 0 0 0 0
>> S103-R_3.901 alt_acceptor 4 5 4 5 10 11
>> S103-R_2.904 skipped_exon 2 4 2 4 33 28
>> S103-R_2.902 alt_acceptor 4 5 4 5 0 0
>> S103-R_1.906 alt_acceptor 0 1 0 1 1 0
>>
>> It's not clear from this example, but overall there is a difference
>> between abundances and noise levels of specific types of alternative
>> splicing I'd like to correct for, but also assess using GLM. Thus,
>> ideally I'd like to find differentially abundant splicing events
>> between the mutant and the wild type irrespective of line and
>> biological replicate.
>>
>> As far as I understood the UserGuide and the ReferenceManual design
>> always refers to factors for describing the libraries/experiments the
>> counts are derived from.
>>
>> If I'd be using "normal" GLM, what I want to do would look like
>> glm(count ~ AS.type + genotype + line + biological.replicate).
>>
>> Can I accomplish this with edgeR without splitting up the events into
>> different data sets per splice type?
>>
>> Any advise on this would be greatly appreciated.
>>
>> Best,
>> Daniel
>>
>> -- output of sessionInfo():
>>
>> R version 3.0.1 (2013-05-16)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
>> [3] LC_TIME=de_DE.utf8 LC_COLLATE=en_US.utf8
>> [5] LC_MONETARY=de_DE.utf8 LC_MESSAGES=en_US.utf8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:23}}
More information about the Bioconductor
mailing list