[BioC] visualise model fit in edgeR
Iain Gallagher
iaingallagher at btopenworld.com
Mon Oct 31 11:26:13 CET 2011
Dear Gordon
Thanks for your reply. There's nothing like someone else's question to
make one focus on what exactly one wants. This was certainly the case
here!
I have given this some thought from my statisically naive
point of view and I have attached a mock-up picture of the kind of thing
I envisaged (although I appreciate the real life situation is more
complicated).
The experimental design is as follows:
Cells
were collected from 6 animals and infected with one of 4 strains of
bacteria or left uninfected. RNA was sampled at 2, 6, 24 & 48 hours
post infection. There are thus 120 data points across the whole
experiment.
I have used edgeR to analyse the infected v
control data at each timepoint using the GLM approach - effectively a
paired samples analysis for each timepoint as per the edgeR manual
(section 11). Perhaps there's something more sophisticated I could do
here though. If you had any advice that would be great!
design <- model.matrix(~ cow + infection)
#dispersion estimate
d <- estimateGLMCommonDisp(d, design)
#fit the NB GLM for each gene
fitFiltered <- glmFit(d, design, dispersion = d$common.dispersion)
#carry out the likliehood ratio test
lrtFiltered <- glmLRT(d, fitFiltered, coef = 7)
For
my audience I simply wanted to illustrate the fitting of the two models
and how likelihood ratio tests are used rather than a t-test approach.
In the attached pdf each black line represents the H1 model (with
infection) and each red line represents the null model (cows only) for
one gene only. The points are the 'raw data' (but not real data); C =
control, I = infected. I realise this illustration is showing
essentially a linear fit but I'm trying to aim for simplicity for the
audience (a conceptual rather than entirely accurate approach). I would
be happy to get my hands dirty coding something more lifelike as I think
that would aid my understanding as well.
I was going to
describe this in terms of the 'fit' of each line to the data i.e. for
the regulated gene the black line is the more 'likely' model whereas in
the non-regulated gene there is little to separate the models.
Hope this is somewhat useful.
Best
Iain
________________________________
From: Gordon K Smyth <smyth at wehi.EDU.AU>
To: Iain Gallagher <iaingallagher at btopenworld.com>
Cc: Yunshun Chen <yuchen at wehi.edu.au>; Bioconductor mailing list <bioconductor at r-project.org>
Sent: Friday, 28 October 2011, 6:36
Subject: visualise model fit in edgeR
Hi Iain,
You're asking a hard question, as drawing nice tutorial pictures for any statistical method can be lots of work, and the context here is harder than most. I think I'd find it hard to think of a good picture like you describe, even if I was just doing a ordinary multiple regression using lm() with univariate normal data. What covariate or factor are you testing for? Can you describe the picture you would draw if this was just an ordinary multiple regression problem?
Best wishes
Gordon
------------- original message ----------------
[BioC] visualise model fit in edgeR
Iain Gallagher iaingallagher at btopenworld.com
Tue Oct 25 16:06:11 CEST 2011
Dear List
I have been using the glmFit method in edgeR to analyse some RNA-Seq data. I will soon be presenting this data to a more statistically naive audience (and I'm no expert myself) and I was hoping to be able to prepapre a figure demonstrating how this particular edgeR analysis approach works.
Basically what I'd like to do would be to plot count data for one (ore perhaps a few) of my genes and then draw a couple of lines showing the fit of the null and alternative models used in the glmLRT method of edgeR to assess gene regulation between conditions.
I was hoping that this would allow me to illustrate the concept of testing for the likelihood of model fit and hence gene regulation between conditions.
If anyone could help I'd be grateful.
Best
Iain
______________________________________________________________________
The information in this email is confidential and intended solely for the addressee.
You must not disclose, forward, print or use it without the permission of the sender.
______________________________________________________________________
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mockModel.pdf
Type: application/pdf
Size: 64641 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20111031/5c2543b2/attachment.pdf>
More information about the Bioconductor
mailing list