[BioC] edgeR choose the best model of GLM
Gordon K Smyth
smyth at wehi.EDU.AU
Fri Feb 22 00:38:55 CET 2013
Dear Eszter,
The problem that you've stated (which of these models best describes my
data) applies to univariate data, for fitting models to a single vector of
counts. For RNA-seq experiments, you are fitting glms to many data
vectors. You will get different answers for different genes, so you have
to think a bit differently.
With edgeR, you would start by fitting the full model
design <- model.matrix(~E+A+E:A)
then, after some intermediate commands, use
fit <- glmFit(y, design)
lrt <- glmLRT(fit, coef=4)
topTags(lrt)
to test for which genes the interaction is significant. If the test is
significant, then the E*A model is necessary and best. If not, one of the
simpler models is satisfactory. You could subset to genes for which the
interaction is not significant, then try
design <- model.matrix(~E+A)
and proceed from there.
I interpolate some other answers below.
> Date: Wed, 20 Feb 2013 15:15:20 +0100
> From: Ari Eszter <arieszter at gmail.com>
> To: bioconductor at r-project.org
> Subject: [BioC] edgeR choose the best model of GLM
>
> Dear edgeR Users,
>
> I have RNS-Seq count data by genes, 4 treatment groups (each with 2
> biological replicates) and two factors (E: C or H, and A: 13 or 25). And
> I applied the GLM model of edgeR with these designs: E, A, E+A and E*A
> I would like to decide which model and which variant describes my data
> better.
>
> 1. Am I right that the $coefficients of a "DGELRT" object are log
> likelihood values?
No. They are coefficients from the linear model fitted, as explained in
the documentation: help("glmFit").
> 2. Does anyone have a suggestion how can I apply AIC (Akaike Information
> Criterion) test or Likelihood ratio (LR) test on edgeR GLM results to
> choose the best model?
Answered above. We have experimented with AIC, but have not found it
particularly successful.
> 3. Is there a possibility to plot the multiple GLM regression for a
> gene? (I would like plot something like this:
> http://www.jerrydallal.com/LHSP/pix/regpix4.gif )
I have no idea what this plot represents. In any case, I guess this is
for one data vector. For RNA-seq data, you would have to view 20,000 of
these plots, not a very productive way to go.
Best wishes
Gordon
> Bests,
> Eszter
>
> --
> Eszter Ari
> Institut für Populationsgenetik
> Vetmeduni Vienna
> Veterinärplatz 1
> 1210 Wien, Austria
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}
More information about the Bioconductor
mailing list