[BioC] Problem with analysis of paired data using Illumina arrays and the Lumi and limma packages
James W. MacDonald
jmacdon at med.umich.edu
Thu Oct 2 19:30:06 CEST 2008
Hi Mathijs,
Groeneweg M (PATH) wrote:
> Dear all,
>
> We are currently analyzing Illumina data using the lumi and affy packages.
> The results we are getting are suspicious: we get adjusted p-values in the 10^-40 to 10^-98 range.
> Obviously, we are doing something wrong. The experiment consists of 20 patients of whom we have positive and negative samples, in other words, we have paired observations in each patient. We did both a paired and an unpaired analysis, but this did not matter much. We also tried SAM analysis using siggenes, but again with the same results. As a test, we also used the non-normalized data, but this did not change the results. Has anybody an idea what is going wrong?
>
> We used the following code (abbreviated where appropriate):
>
> library(lumi);
> library(limma);
> library(Matrix);
> library(GOstats);
> library(siggenes);
> fileName <- 'Group Probe Profile TAB paired.csv';
> wouter.lumi <- lumiR(fileName, lib='lumiHumanAll.db');
> wouter.lumi;
> summary(wouter.lumi, 'QC');
>
> wouter.lumi.N.Q <- lumiExpresso(wouter.lumi, QC.evaluation=TRUE);
> summary(wouter.lumi.N.Q);
> summary(wouter.lumi.N.Q, 'QC');
> ## Differential expression over all data sets and gene indentification
> ## wouter.lumi.N <- lumiN(wouter.lumi.N.Q);
> dataMatrix <- exprs(wouter.lumi.N.Q);
> presentCount <- detectionCall(wouter.lumi.N.Q);
> selDataMatrix <- dataMatrix[presentCount > 0,];
> selProbe <- rownames(selDataMatrix);
> ## Paired Comparission
> ## specify the groups and conditions
> PlaqueType <- c('A', 'B', 'A', 'B', 'A', 'B', ... 'A', 'B', 'A', 'B', 'A') ;
> PatientType <- c("p01", "p01", "p02", "p02", "p03", "p03", ... "p21", "p21", "p22", "p22", "p16", "p18");
> ## top 10 genes
> if(require(limma))
> {
> factor.PlaqueType <- factor(PlaqueType, levels=c("A","B"));
> factor.PatientType <- factor(PatientType);
> design <- model.matrix(~ 0+factor.PatientType+factor.PlaqueType);
> fit <- lmFit(selDataMatrix, design);
> ## contrast.matrix <-makeContrasts(A-B, levels=design);
> ## fit2<-contrasts.fit(fit,contrast.matrix)
> fit <-eBayes(fit);
> if (require(lumiHumanAll.db) & require(annotate))
> {
> if (nrow(fit$genes) == 0) fit$genes <- data.frame(ID=rownames(selDataMatrix))
> geneSymbol <- getSYMBOL(fit$genes$ID, 'lumiHumanAll.db')
> fit$genes <- data.frame(fit$genes, geneSymbol=geneSymbol)
> }
> ## Print the top 10 genes
> print(topTable(fit, number=10))
>
> ## get significant gene list with FDR adjusted p values less than 0.01
> p.adj <- p.adjust(fit$p.value[,2])
> sigGene.adj <- selProbe[ p.adj < 0.01]
> ## write.table(sigGene.adj, file="significant_genes.csv", sep=";");
> ## without FDR adjustment
> sigGene <- selProbe[ fit$p.value[,2] < 0.01]
You have commented out the part where you fit the contrast, instead
running eBayes() on your original model fit. Since you aren't computing
a contrast, you are simply testing to see if the coefficients are
different from zero (in this case you are testing to see if 'B' is
different from zero). For the vast majority of your data this will be
unequivocally true, so you will get really small p-values.
You also seem to want to do unnecessarily tricky things. Rather than
subsetting the data objects by hand, why don't you simply use the
wrappers that exist in limma to do these things for you? For instance,
you can get the top genes based on p-value and/or fold change using
topTable().
Best,
Jim
> }
> ID geneSymbol factor.PatientTypep01 factor.PatientTypep02
> 1820 ZdpxBOa0B_.YqnniiE C22orf33 7.324658 7.314660
> 7974 oeBgiYqwKqi5.p7lTc MLLT4 7.338124 7.305449
> 1678 HYFYQOpk4Y6IkqA6Hk C1orf183 7.310553 7.350842
> 27 Nonl3upr2ojFAogAsU ABCA5 7.301086 7.296429
> 12509 BkA6hCdSI1BR4pUQio TAS2R3 7.311262 7.324070
> 11087 oTZyp7gBIx7iV4FdSg RXRG 7.324615 7.318605
> 2657 oBKSggJ.exxFIq_qe4 CDY1B 7.394242 7.332206
> 10176 fSIikgsewouigigijo PSMC3IP 7.372082 7.354313
> 12106 Wqi4jiw9KgiCoIiCLc SPINLW1 7.339931 7.297919
> 2429 6lWX99edJ1RohUS3dc CCR10 7.393743 7.383870
> factor.PatientTypep03 factor.PatientTypep04 factor.PatientTypep05
> 1820 7.335805 7.307950 7.314493
> 7974 7.344440 7.327994 7.343561
> 1678 7.314656 7.342585 7.320585
> 27 7.275228 7.344058 7.287226
> 12509 7.328430 7.317047 7.325980
> 11087 7.362075 7.319144 7.307189
> 2657 7.392088 7.380562 7.326872
> 10176 7.418103 7.346580 7.331416
> 12106 7.355262 7.334333 7.312377
> 2429 7.402435 7.382996 7.398166
>
> etc...
>
> factor.PlaqueTypeB AveExpr F P.Value adj.P.Val
> 1820 5.812695e-03 7.334350 79437.93 1.524058e-46 6.165472e-43
> 7974 9.565944e-03 7.351467 76510.25 2.253042e-46 6.165472e-43
> 1678 7.535491e-03 7.351725 76065.06 2.394156e-46 6.165472e-43
> 27 5.886626e-05 7.312479 73115.55 3.613654e-46 6.165472e-43
> 12509 -2.588097e-05 7.331387 72769.81 3.796430e-46 6.165472e-43
> 11087 -5.466541e-04 7.341570 71478.61 4.574186e-46 6.165472e-43
> 2657 6.219771e-03 7.359755 69690.49 5.954555e-46 6.165472e-43
> 10176 1.360613e-03 7.367441 69121.32 6.485202e-46 6.165472e-43
> 12106 -3.237106e-03 7.327304 68823.54 6.783319e-46 6.165472e-43
> 2429 -1.786489e-02 7.380408 68733.39 6.876512e-46 6.165472e-43
>
> Thanks you very much in advance,
>
> Mathijs
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662
More information about the Bioconductor
mailing list