[BioC] Unexpected results using limma with numerical factor
Gordon Smyth
smyth at wehi.edu.au
Thu Aug 26 09:50:31 CEST 2004
At 12:30 PM 25/08/2004, paul.boutros at utoronto.ca wrote:
>Hi again,
>
>For an experiment I'm analyzing, I do not have a series of
>factors. Rather, I
>have a pair of test-score (numerical) for each replicate animal. This is
>Affymetrix data, and for my initial pass I tried with only a single
>test-score.
>The commands I used are below:
>
> > library(gcrma);
>Welcome to Bioconductor
> Vignettes contain introductory material. To view,
> simply type: openVignette()
> For details on reading vignettes, see
> the openVignette help page.
> > library(limma);
> > cel.files <- c(
>+ 'RAE230_2_060104_LH_IM07T.CEL',
>+ 'RAE230_2_060104_LH_IM08T.CEL',
>+ 'RAE230_2_060104_LH_IM09T.CEL',
>+ 'RAE230_2_060104_LH_IM10T.CEL',
>+ 'RAE230_2_060204_LH_IM07T.CEL',
>+ 'RAE230_2_060204_LH_IM08T.CEL',
>+ 'RAE230_2_060204_LH_IM09T.CEL',
>+ 'RAE230_2_060204_LH_IM10T.CEL'
>+ );
> > eset <- ReadAffy(filenames=cel.files, phenoData="phenodata.txt");
> > eset <- ReadAffy(filenames=cel.files, phenoData="phenodata.txt");
> > eset;
>AffyBatch object
>size of arrays=834x834 features (43476 kb)
>cdf=Rat230_2 (31099 affyids)
>number of samples=8
>number of genes=31099
>annotation=rat2302
> > pData(eset);
> TestScore1
>RAE230_2_060104_LH_IM07T.CEL 0.58
>RAE230_2_060104_LH_IM08T.CEL -2.36
>RAE230_2_060104_LH_IM09T.CEL -12.24
>RAE230_2_060104_LH_IM10T.CEL -14.84
>RAE230_2_060204_LH_IM07T.CEL 0.15
>RAE230_2_060204_LH_IM08T.CEL -3.23
>RAE230_2_060204_LH_IM09T.CEL -11.66
>RAE230_2_060204_LH_IM10T.CEL -12.91
> > eset <- rma(eset);
>Background correcting
>Normalizing
>Calculating Expression
> > design <- model.matrix(~-1 + TestScore1, pData(eset));
Almost certainly you should use
design <- model.matrix(~TestScore1, pData(eset))
i.e., there is no justification for removing the intercept from your model.
> > design;
> TestScore1
>RAE230_2_060104_LH_IM07T.CEL 0.58
>RAE230_2_060104_LH_IM08T.CEL -2.36
>RAE230_2_060104_LH_IM09T.CEL -12.24
>RAE230_2_060104_LH_IM10T.CEL -14.84
>RAE230_2_060204_LH_IM07T.CEL 0.15
>RAE230_2_060204_LH_IM08T.CEL -3.23
>RAE230_2_060204_LH_IM09T.CEL -11.66
>RAE230_2_060204_LH_IM10T.CEL -12.91
>attr(,"assign")
>[1] 1
> > fit1 <- lmFit(eset, design);
> > fit3 <- eBayes(fit1);
>
>
>All proceeds well without any error-messages, so I believed I had
>successfully
>fit my model. When I extract the data, however, I get some unexpected
>results:
>
> > topTable(fit3, coef=1, number=20, adjust="fdr");
> ID M A t P.Value B
>104 1367555_at -1.212175 14.77173 -6.808742 3.912627e-07 14.75482
>105 1367556_s_at -1.203275 14.67175 -6.762709 3.912627e-07 14.48816
>3411 1370862_at -1.185363 14.42457 -6.674599 3.912627e-07 13.98156
>2777 1370228_at -1.184768 14.46704 -6.666414 3.912627e-07 13.93475
>549 1368000_at -1.175866 14.33987 -6.622929 3.912627e-07 13.68683
>2697 1370148_at -1.174858 14.32747 -6.617795 3.912627e-07 13.65764
>837 1368288_at -1.170412 14.25957 -6.596453 3.912627e-07 13.53649
>710 1368161_a_at -1.169621 14.27702 -6.589664 3.912627e-07 13.49801
>420 1367871_at -1.166552 14.09481 -6.588461 3.912627e-07 13.49120
>2558 1370009_at -1.162241 14.18252 -6.552347 3.912627e-07 13.28707
>147 1367598_at -1.161072 14.19550 -6.543620 3.912627e-07 13.23787
>2576 1370027_a_at -1.158955 14.13682 -6.536068 3.912627e-07 13.19533
>1136 1368587_at -1.154447 14.04036 -6.517046 3.912627e-07 13.08836
>196 1367647_at -1.154999 14.07951 -6.516673 3.912627e-07 13.08627
>31081 AFFX-r2-P1-cre-5_at -1.153568 14.05140 -6.510380 3.912627e-07 13.05094
>19150 1387082_at -1.153501 14.05431 -6.509674 3.912627e-07 13.04697
>2898 1370349_a_at -1.153164 14.07047 -6.505935 3.912627e-07 13.02599
>1235 1368686_at -1.152660 14.04966 -6.504796 3.912627e-07 13.01960
>8200 1375651_at -1.152557 14.04428 -6.504674 3.912627e-07 13.01892
>19359 1387291_at -1.151666 14.03699 -6.499744 3.912627e-07 12.99127
>
>The near-identical M, A, and p-values indicate a problem, and none of the
>genes
>on this list are very plausible biologically for our test-system. Based
>on that
>I'm pretty sure I've gone astray somewhere.
>
>Is it possible to use numeric scores in fitting a linear model with
>limma? If
>so, am I asking the question in the right manner? If not, are there any
>BioConductor tools appropriate for this kind of question?
Yes, you can use numeric scores with limma.
Gordon
>Any help very much appreciated,
>Paul
More information about the Bioconductor
mailing list