[BioC] visualise model fit in edgeR

Mon Oct 31 11:26:13 CET 2011

Dear Gordon

Thanks for your reply. There's nothing like someone else's question to 
make one focus on what exactly one wants. This was certainly the case 
here!

I have given this some thought from my statisically naive 
point of view and I have attached a mock-up picture of the kind of thing
 I envisaged (although I appreciate the real life situation is more 
complicated).

The experimental design is as follows:

Cells
 were collected from 6 animals and infected with one of 4 strains of 
bacteria or left uninfected. RNA was sampled at 2, 6, 24 & 48 hours 
post infection. There are thus 120 data points across the whole 
experiment.

I have used edgeR to analyse the infected v 
control data at each timepoint using the GLM approach  - effectively a 
paired samples analysis for each timepoint  as per the edgeR manual 
(section 11). Perhaps there's something more sophisticated I could do 
here though. If you had any advice that would be great!

design <- model.matrix(~ cow + infection)
#dispersion estimate
d <- estimateGLMCommonDisp(d, design)
#fit the NB GLM for each gene
fitFiltered <- glmFit(d, design, dispersion = d$common.dispersion)
#carry out the likliehood ratio test
lrtFiltered <- glmLRT(d, fitFiltered, coef = 7)

For
 my audience I simply wanted to illustrate the fitting of the two models
 and how likelihood ratio tests are used rather than a t-test approach. 
In the attached pdf each black line represents the H1 model (with 
infection) and each red line represents the null model (cows only) for 
one gene only. The points are the 'raw data' (but not real data); C = 
control, I = infected. I realise this illustration is showing 
essentially a linear fit but I'm trying to aim for simplicity for the 
audience (a conceptual rather than entirely accurate approach). I would 
be happy to get my hands dirty coding something more lifelike as I think
 that would aid my understanding as well.

I was going to 
describe this in terms of the 'fit' of each line to the data i.e. for 
the regulated gene the black line is the more 'likely' model whereas in 
the non-regulated gene there is little to separate the models.

Hope this is somewhat useful.

Best

Iain

________________________________
From: Gordon K Smyth <smyth at wehi.EDU.AU>
To: Iain Gallagher <iaingallagher at btopenworld.com>
Cc: Yunshun Chen <yuchen at wehi.edu.au>; Bioconductor mailing list <bioconductor at r-project.org>
Sent: Friday, 28 October 2011, 6:36
Subject: visualise model fit in edgeR

Hi Iain,

You're asking a hard question, as drawing nice tutorial pictures for any statistical method can be lots of work, and the context here is harder than most.  I think I'd find it hard to think of a good picture like you describe, even if I was just doing a ordinary multiple regression using lm() with univariate normal data.  What covariate or factor are you testing for?  Can you describe the picture you would draw if this was just an ordinary multiple regression problem?

Best wishes
Gordon

------------- original message ----------------

[BioC] visualise model fit in edgeR
Iain Gallagher iaingallagher at btopenworld.com
Tue Oct 25 16:06:11 CEST 2011

Dear List

I have been using the glmFit method in edgeR to analyse some RNA-Seq data. I will soon be presenting this data to a more statistically naive audience (and I'm no expert myself) and I was hoping to be able to prepapre a figure demonstrating how this particular edgeR analysis approach works.

Basically what I'd like to do would be to plot count data for one (ore perhaps a few) of my genes and then draw a couple of lines showing the fit of the null and alternative models used in the glmLRT method of edgeR to assess gene regulation between conditions.

I was hoping that this would allow me to illustrate the concept of testing for the likelihood of model fit and hence gene regulation between conditions.

If anyone could help I'd be grateful.

Best

Iain

______________________________________________________________________
The information in this email is confidential and intended solely for the addressee.
You must not disclose, forward, print or use it without the permission of the sender.
______________________________________________________________________
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mockModel.pdf
Type: application/pdf
Size: 64641 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20111031/5c2543b2/attachment.pdf>