[BioC] Problem with analysis of paired data using Illumina arrays and the Lumi and limma packages

Thu Oct 2 19:30:06 CEST 2008

Hi Mathijs,

Groeneweg M (PATH) wrote:
> Dear all,
> 
> We are currently analyzing Illumina data using the lumi and affy packages.
> The results we are getting are suspicious: we get adjusted p-values in the 10^-40 to 10^-98 range.
> Obviously, we are doing something wrong. The experiment consists of 20 patients of whom we have positive and negative samples, in other words, we have paired observations in each patient. We did both a paired and an unpaired analysis, but this did not matter much. We also tried SAM analysis using siggenes, but again with the same results. As a test, we also used the non-normalized data, but this did not change the results. Has anybody an idea what is going wrong?
> 
> We used the following code (abbreviated where appropriate):
> 
> library(lumi);
> library(limma);
> library(Matrix);
> library(GOstats);
> library(siggenes);
> fileName <- 'Group Probe Profile TAB paired.csv';
> wouter.lumi <- lumiR(fileName, lib='lumiHumanAll.db');
> wouter.lumi;
> summary(wouter.lumi, 'QC');
> 
> wouter.lumi.N.Q <- lumiExpresso(wouter.lumi, QC.evaluation=TRUE);
> summary(wouter.lumi.N.Q);
> summary(wouter.lumi.N.Q, 'QC');
> ## Differential expression over all data sets and gene indentification
> ## wouter.lumi.N <- lumiN(wouter.lumi.N.Q);
> dataMatrix <- exprs(wouter.lumi.N.Q);
> presentCount <- detectionCall(wouter.lumi.N.Q);
> selDataMatrix <- dataMatrix[presentCount > 0,];
> selProbe <- rownames(selDataMatrix);
> ## Paired Comparission
> ## specify the groups and conditions
> PlaqueType <- c('A', 'B', 'A', 'B', 'A', 'B', ... 'A', 'B', 'A', 'B', 'A') ;
> PatientType <- c("p01", "p01", "p02", "p02", "p03", "p03", ... "p21", "p21", "p22", "p22", "p16", "p18"); 
> ## top 10 genes
> if(require(limma)) 
> {
> factor.PlaqueType <- factor(PlaqueType, levels=c("A","B"));
> factor.PatientType <- factor(PatientType);
> design <- model.matrix(~ 0+factor.PatientType+factor.PlaqueType);
> fit <- lmFit(selDataMatrix, design);
> ## contrast.matrix <-makeContrasts(A-B, levels=design);
> ## fit2<-contrasts.fit(fit,contrast.matrix)
> fit <-eBayes(fit);
> if (require(lumiHumanAll.db) & require(annotate))
>   {
>   if (nrow(fit$genes) == 0) fit$genes <- data.frame(ID=rownames(selDataMatrix))
>     geneSymbol <- getSYMBOL(fit$genes$ID, 'lumiHumanAll.db')
>     fit$genes <- data.frame(fit$genes, geneSymbol=geneSymbol)
>   }
> ## Print the top 10 genes
> print(topTable(fit, number=10))
> 
> ## get significant gene list with FDR adjusted p values less than 0.01
> p.adj <- p.adjust(fit$p.value[,2])
> sigGene.adj <- selProbe[ p.adj < 0.01]
> ## write.table(sigGene.adj, file="significant_genes.csv", sep=";"); 
> ## without FDR adjustment
> sigGene <- selProbe[ fit$p.value[,2] < 0.01]

You have commented out the part where you fit the contrast, instead 
running eBayes() on your original model fit. Since you aren't computing 
a contrast, you are simply testing to see if the coefficients are 
different from zero (in this case you are testing to see if 'B' is 
different from zero). For the vast majority of your data this will be 
unequivocally true, so you will get really small p-values.

You also seem to want to do unnecessarily tricky things. Rather than 
subsetting the data objects by hand, why don't you simply use the 
wrappers that exist in limma to do these things for you? For instance, 
you can get the top genes based on p-value and/or fold change using 
topTable().

Best,

Jim

> }
>                       ID geneSymbol factor.PatientTypep01 factor.PatientTypep02
> 1820  ZdpxBOa0B_.YqnniiE   C22orf33              7.324658              7.314660
> 7974  oeBgiYqwKqi5.p7lTc      MLLT4              7.338124              7.305449
> 1678  HYFYQOpk4Y6IkqA6Hk   C1orf183              7.310553              7.350842
> 27    Nonl3upr2ojFAogAsU      ABCA5              7.301086              7.296429
> 12509 BkA6hCdSI1BR4pUQio     TAS2R3              7.311262              7.324070
> 11087 oTZyp7gBIx7iV4FdSg       RXRG              7.324615              7.318605
> 2657  oBKSggJ.exxFIq_qe4      CDY1B              7.394242              7.332206
> 10176 fSIikgsewouigigijo    PSMC3IP              7.372082              7.354313
> 12106 Wqi4jiw9KgiCoIiCLc    SPINLW1              7.339931              7.297919
> 2429  6lWX99edJ1RohUS3dc      CCR10              7.393743              7.383870
>       factor.PatientTypep03 factor.PatientTypep04 factor.PatientTypep05
> 1820               7.335805              7.307950              7.314493
> 7974               7.344440              7.327994              7.343561
> 1678               7.314656              7.342585              7.320585
> 27                 7.275228              7.344058              7.287226
> 12509              7.328430              7.317047              7.325980
> 11087              7.362075              7.319144              7.307189
> 2657               7.392088              7.380562              7.326872
> 10176              7.418103              7.346580              7.331416
> 12106              7.355262              7.334333              7.312377
> 2429               7.402435              7.382996              7.398166
> 
> etc...
> 
>       factor.PlaqueTypeB  AveExpr        F      P.Value    adj.P.Val
> 1820        5.812695e-03 7.334350 79437.93 1.524058e-46 6.165472e-43
> 7974        9.565944e-03 7.351467 76510.25 2.253042e-46 6.165472e-43
> 1678        7.535491e-03 7.351725 76065.06 2.394156e-46 6.165472e-43
> 27          5.886626e-05 7.312479 73115.55 3.613654e-46 6.165472e-43
> 12509      -2.588097e-05 7.331387 72769.81 3.796430e-46 6.165472e-43
> 11087      -5.466541e-04 7.341570 71478.61 4.574186e-46 6.165472e-43
> 2657        6.219771e-03 7.359755 69690.49 5.954555e-46 6.165472e-43
> 10176       1.360613e-03 7.367441 69121.32 6.485202e-46 6.165472e-43
> 12106      -3.237106e-03 7.327304 68823.54 6.783319e-46 6.165472e-43
> 2429       -1.786489e-02 7.380408 68733.39 6.876512e-46 6.165472e-43
> 
> Thanks you very much in advance,
> 
> Mathijs
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662