[BioC] BioC] using limma with no replicates
Gordon Smyth
smyth at wehi.edu.au
Sun Apr 2 01:07:55 CEST 2006
Dear Pedro,
The strategy you are proposing is to ignore experimental factors
which you think will have relatively small effects, so as to generate
some degrees of freedom for error. This is an ok strategy, long used
in statistics, as long as you understand clearly what you are testing
for. If you do this, limma will try to find genes which have
differential expression which stands out relative to the effects you
have ignored.
Power is not the issue here. This approach is actually conservative,
in that the residual variability will be larger than if you had true
replicate arrays, hence you will find fewer DE genes than you might otherwise.
Best wishes
Gordon
>Date: Fri, 31 Mar 2006 12:48:20 +0200
>From: Pedro L?pez Romero <plopez at cnic.es>
>Subject: [BioC] using limma with no replicates
>To: <bioconductor at stat.math.ethz.ch>
>
>Dear list,
>
>I have been given with some data to analyze. Unfortunately they only gave 1
>replicate per experimental condition, so I do not expect to draw meaningful
>information from here. Anyway, I would like to use limma, since I expect
>that this could be more powerful than the mere inspection of the log2 fold
>change.
>
>Despite I do not have "true biological replicates", I think that I can
>group (in the design matrix) some arrays as if they were replicates
>according to the correlations that I expect from the experimental conditions
>and how the data have been generated. For example, I can group 2 arrays that
>belong to the same strain, although they have been treated a bit different,
>or I can group 2 arrays that belong to the same strain and treatment but
>different age of the mouse. This "grouped data" are not going to be part of
>the contrast. My intention (and I do not know if it is right) is to group
>some correlated data to have some degrees of freedom available to make it
>possible the estimates of the variance, and then to make contrasts with
>other 2 non replicated arrays.- I think that this would be somehow more
>powerful than the log2 fold change inspection, since the information is
>better handled trough the empirical Bayes that limma implements, but I would
>feel better if someone back me up, because I am not pretty sure if this is a
>good idea.
>
>
>Some piece of my code:
>
>design= model.matrix(~ -1 + factor(c(1,2,3,3,5,6,7,8)))
>colnames(design) =c("WT","upa","g1","f5","f6","f7","f8")
>
> here g1 groups the same strain (and different from other
>strains), and same age of the mouse but slight different pharmacologicall
>treatment, and I will compare f5 vs f6 (this are the same strain and
>different from g1, are the same age, but treatment are different)
>
>CM= makeContrasts(f5-f6,levels=design)
>
>
>Doing this, the M values that I observe in the top list are quite high (>
>6), but the differences are not significant. I think that this is due to the
>absence of replication in a very noisy sistem.
>
>ID M A t P.Value B
>23620 mCG147262 -9.0828928978708
> 7.04453315872284 -20.6287756557693
> -0.823196144084987
>19275 mCG1047122 -6.22956426050092
>.91829704792039 -15.5769614644597 1 -0.940793980765775
>
>If I use genefilter to filter out some genes, some genes appear significant
>DE though. Would it be possible to explain this just by saying that fdr-like
>techniques becomes more sensitive as less comparison are done??
>
>ID M A t P.Value B
>263 mCG142389 -7.97481171094547
>.73475871266083 -5.3168578969303 0.00832939443377308
>6.57330274986848
>6756 BC027122 -7.40473059624002
>.77564203692944 -4.93678117706839 0.0313305586976585
>4.89829085664067
>
>
>I would appreciate any comment or suggestion very much.-
>Thank you.
>
>plr.-
More information about the Bioconductor
mailing list