[Bioc-sig-seq] RNASeq, differential expression between group, and large variance within groups

Gordon K Smyth smyth at wehi.EDU.AU
Wed Mar 2 05:18:10 CET 2011


Hi Laurent,

Thanks for the nice summary.  Two more points:

1. edgeR will stop reporting tags with extreme variances as outliers if 
the user reduces the prior weight, prior.n, given to the common dispersion 
(expressed in terms of the number of notional prior tags).  Seeing such 
tags in the topTags table may prompt the user to do this.

2. It would be very helpful to know whether these high variance tags arise 
from (i) technical errors specific to one count, (ii) technical issues 
affecting a tag or (iii) genuine biological variation.  If (i), then we 
could design software to detect outlier counts.  If (ii), we could design 
software to detect outlier tags.  If (iii), then an empirical Bayes 
approach to moderating the dispersions, such as is done by edgeR, may be 
the best that can be done.

I don't know for sure how to distinguish these causes, but here are some 
thoughts.  In your original post, you showed a tag with a large count for 
library A3 but zeros for all other libraries.  Is library A3 
systematically different from libraries A1 and A2 for other tags as well 
as this one?  If this tag is part of co-regulated pathways that are highly 
expressed in A3 relative to the others, then likely it is real biological 
variation.  If A3 differs from A1 and A2 only in a handful of tags with no 
biological connection, then perhaps it is a technical issue.

Regards
Gordon

> Date: Tue, 01 Mar 2011 10:25:31 +0100
> From: Laurent Gautier <lgautier at gmail.com>
> To: bioc-sig-sequencing at r-project.org
> Cc: anders at embl.de
> Subject: Re: [Bioc-sig-seq] RNASeq, differential expression between
> 	group, and large variance within groups
>
> Thanks to Mads, Simon, and Steve.
>
> In summary:
>
> - extreme variance within group (zero or large value) is not a good
> sign, and experimental issues are to be suspected
> - pooling (summing) tags over reference transcripts can rescue some of
> the signal
> - DESeq, and to some extent edgeR, will report as differentially
> expressed such gene/tags with such pathological counts while they should
> not. The issue is acknowledged and care should be taken (here we use
> various visualizations to complement the p-values).
>
> Laurent

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioc-sig-sequencing mailing list