[BioC] Unexpected results using limma with numerical factor

Gordon Smyth smyth at wehi.edu.au
Thu Aug 26 09:50:31 CEST 2004


At 12:30 PM 25/08/2004, paul.boutros at utoronto.ca wrote:
>Hi again,
>
>For an experiment I'm analyzing, I do not have a series of 
>factors.  Rather, I
>have a pair of test-score (numerical) for each replicate animal.  This is
>Affymetrix data, and for my initial pass I tried with only a single 
>test-score.
>The commands I used are below:
>
> > library(gcrma);
>Welcome to Bioconductor
>          Vignettes contain introductory material.  To view,
>          simply type: openVignette()
>          For details on reading vignettes, see
>          the openVignette help page.
> > library(limma);
> > cel.files <- c(
>+ 'RAE230_2_060104_LH_IM07T.CEL',
>+ 'RAE230_2_060104_LH_IM08T.CEL',
>+ 'RAE230_2_060104_LH_IM09T.CEL',
>+ 'RAE230_2_060104_LH_IM10T.CEL',
>+ 'RAE230_2_060204_LH_IM07T.CEL',
>+ 'RAE230_2_060204_LH_IM08T.CEL',
>+ 'RAE230_2_060204_LH_IM09T.CEL',
>+ 'RAE230_2_060204_LH_IM10T.CEL'
>+ );
> > eset <- ReadAffy(filenames=cel.files, phenoData="phenodata.txt");
> > eset <- ReadAffy(filenames=cel.files, phenoData="phenodata.txt");
> > eset;
>AffyBatch object
>size of arrays=834x834 features (43476 kb)
>cdf=Rat230_2 (31099 affyids)
>number of samples=8
>number of genes=31099
>annotation=rat2302
> > pData(eset);
>                                TestScore1
>RAE230_2_060104_LH_IM07T.CEL         0.58
>RAE230_2_060104_LH_IM08T.CEL        -2.36
>RAE230_2_060104_LH_IM09T.CEL       -12.24
>RAE230_2_060104_LH_IM10T.CEL       -14.84
>RAE230_2_060204_LH_IM07T.CEL         0.15
>RAE230_2_060204_LH_IM08T.CEL        -3.23
>RAE230_2_060204_LH_IM09T.CEL       -11.66
>RAE230_2_060204_LH_IM10T.CEL       -12.91
> > eset <- rma(eset);
>Background correcting
>Normalizing
>Calculating Expression
> > design <- model.matrix(~-1 + TestScore1, pData(eset));

Almost certainly you should use

design <- model.matrix(~TestScore1, pData(eset))

i.e., there is no justification for removing the intercept from your model.

> > design;
>                                TestScore1
>RAE230_2_060104_LH_IM07T.CEL         0.58
>RAE230_2_060104_LH_IM08T.CEL        -2.36
>RAE230_2_060104_LH_IM09T.CEL       -12.24
>RAE230_2_060104_LH_IM10T.CEL       -14.84
>RAE230_2_060204_LH_IM07T.CEL         0.15
>RAE230_2_060204_LH_IM08T.CEL        -3.23
>RAE230_2_060204_LH_IM09T.CEL       -11.66
>RAE230_2_060204_LH_IM10T.CEL       -12.91
>attr(,"assign")
>[1] 1
> > fit1 <- lmFit(eset, design);
> > fit3 <- eBayes(fit1);
>
>
>All proceeds well without any error-messages, so I believed I had 
>successfully
>fit my model.  When I extract the data, however, I get some unexpected 
>results:
>
> > topTable(fit3, coef=1, number=20, adjust="fdr");
>                        ID         M        A         t      P.Value        B
>104            1367555_at -1.212175 14.77173 -6.808742 3.912627e-07 14.75482
>105          1367556_s_at -1.203275 14.67175 -6.762709 3.912627e-07 14.48816
>3411           1370862_at -1.185363 14.42457 -6.674599 3.912627e-07 13.98156
>2777           1370228_at -1.184768 14.46704 -6.666414 3.912627e-07 13.93475
>549            1368000_at -1.175866 14.33987 -6.622929 3.912627e-07 13.68683
>2697           1370148_at -1.174858 14.32747 -6.617795 3.912627e-07 13.65764
>837            1368288_at -1.170412 14.25957 -6.596453 3.912627e-07 13.53649
>710          1368161_a_at -1.169621 14.27702 -6.589664 3.912627e-07 13.49801
>420            1367871_at -1.166552 14.09481 -6.588461 3.912627e-07 13.49120
>2558           1370009_at -1.162241 14.18252 -6.552347 3.912627e-07 13.28707
>147            1367598_at -1.161072 14.19550 -6.543620 3.912627e-07 13.23787
>2576         1370027_a_at -1.158955 14.13682 -6.536068 3.912627e-07 13.19533
>1136           1368587_at -1.154447 14.04036 -6.517046 3.912627e-07 13.08836
>196            1367647_at -1.154999 14.07951 -6.516673 3.912627e-07 13.08627
>31081 AFFX-r2-P1-cre-5_at -1.153568 14.05140 -6.510380 3.912627e-07 13.05094
>19150          1387082_at -1.153501 14.05431 -6.509674 3.912627e-07 13.04697
>2898         1370349_a_at -1.153164 14.07047 -6.505935 3.912627e-07 13.02599
>1235           1368686_at -1.152660 14.04966 -6.504796 3.912627e-07 13.01960
>8200           1375651_at -1.152557 14.04428 -6.504674 3.912627e-07 13.01892
>19359          1387291_at -1.151666 14.03699 -6.499744 3.912627e-07 12.99127
>
>The near-identical M, A, and p-values indicate a problem, and none of the 
>genes
>on this list are very plausible biologically for our test-system.  Based 
>on that
>I'm pretty sure I've gone astray somewhere.
>
>Is it possible to use numeric scores in fitting a linear model with 
>limma?  If
>so, am I asking the question in the right manner?  If not, are there any
>BioConductor tools appropriate for this kind of question?

Yes, you can use numeric scores with limma.

Gordon

>Any help very much appreciated,
>Paul



More information about the Bioconductor mailing list