[BioC] Comparing Two Affymetrix Arrays Question
Abboud, Ramzi
Ramzi_Abboud at URMC.Rochester.edu
Fri Jun 24 15:44:27 CEST 2011
Jenny et al.,
Thanks again for the help so far.
I did some reading about FDR and it makes perfect sense. If you are considering 45,000 genes it is much more likely you will see a difference between the two data sets, and this needs to be corrected for.
Following your advice, I reverted my code back to fitting one model on all 3 groups in order to get more power. I then pull out the pairwise comparisons that I want and write them out to files. The comparisons again are Group A vs Group B, Group A vs Wild Type, and Group B vs Wild Type. Below is my code, does it look correct? There are three main things I'd like to confirm are correct. 1 - The statistics calls. 2 - The contrasts matrix. 3 - Pulling out the pairwise comparisons at the end.
The adjusted p-value is still the same (.999_) in this 3 group model. So is the final conclusion that Group A and Group B are too similar to show significant difference with this sample size / study power? And if I consider the unadjusted p-value, is the main concern a high false positive rate?
Thank you,
Ramzi
My Code:
## Load Packages
library(affy) # Affymetrix pre-processing
library(limma) # two-color pre-processing; differential expression
library(Biobase)
## Read targets file.
pd <- read.AnnotatedDataFrame("Targets.txt",header=TRUE,row.names=1,as.is=TRUE)
## Read .CEL data.
rawAffyData <- ReadAffy(filenames=pData(pd)$FileName,phenoData=pd)
## Normalize the data
eset <- rma(rawAffyData)
## The target file information can be recovered from the eset.
targets <- pData(eset)
## Define a design matrix.
designMatrix <- model.matrix(~ -1 + factor(targets$Target,levels=unique(targets$Target)))
colnames(designMatrix) <- unique(targets$Target)
numParameters <- ncol(designMatrix)
parameterNames <- colnames(designMatrix)
## Define a contrasts matrix.
## Three comparisons:
## 1. Col vs Exp
## 2. Col vs WT
## 3. Exp vs WT
contrastNames <- c(paste(parameterNames[1],parameterNames[2],sep="_vs_"),
paste(parameterNames[1],parameterNames[3],sep="_vs_"),
paste(parameterNames[2],parameterNames[3],sep="_vs_"))
contrastsMatrix <- matrix(c(1,-1,0,1,0,-1,0,1,-1),nrow=ncol(designMatrix))
rownames(contrastsMatrix) <- parameterNames
colnames(contrastsMatrix) <- contrastNames
# This is my contrasts matrix. Collapsed represents Group A, Expanding represents Group B, and WildType is. . . Wild Type.
contrastsMatrix
Collapsed_vs_Expanding Collapsed_vs_WildType Expanding_vs_WildType
Collapsed 1 1 0
Expanding -1 0 1
WildType 0 -1 -1
## Fit to linear model.
fit <- lmFit(eset,design=designMatrix)
## Fit a linear model for the contrasts.
fit2 <- contrasts.fit(fit,contrasts=contrastsMatrix)
## Empirical Bayes statistics
fit2 <- eBayes(fit2)
numGenes <- nrow(eset)
completeTable_Col_vs_Exp <- topTable(fit2,coef="Collapsed_vs_Expanding",number=numGenes)
write.table(completeTable_Col_vs_Exp,file="Col_vs_Exp_genes.xls",sep="\t",quote=FALSE,col.names=NA)
completeTable_Col_vs_WT <- topTable(fit2,coef="Collapsed_vs_WildType",number=numGenes)
write.table(completeTable_Col_vs_WT,file="Col_vs_WT_genes.xls",sep="\t",quote=FALSE,col.names=NA)
completeTable_Exp_vs_WT <- topTable(fit2,coef="Expanding_vs_WildType",number=numGenes)
write.table(completeTable_Exp_vs_WT,file="Exp_vs_WT_genes.xls",sep="\t",quote=FALSE,col.names=NA)
________________________________________
From: Jenny Drnevich [drnevich at illinois.edu]
Sent: Thursday, June 23, 2011 4:38 PM
To: Abboud, Ramzi; bioconductor at r-project.org
Subject: Re: [BioC] Comparing Two Affymetrix Arrays Question
Hi Ramzi,
There's nothing "wrong" with your results - there is simply not
enough statistical evidence for differential expression between Group
A and Group B. Is there a reason you are not following the
limmaUserGuide() section 8.6, which goes through how to fit one model
on 3 groups and pull out the pairwise comparisons you want? You'll
get more power to detect differences between Groups A and B by
fitting one model with all 3 groups instead of fitting separate models.
Best,
Jenny
At 03:30 PM 6/23/2011, Abboud, Ramzi wrote:
>Hello All,
>
>I am somewhat new to bioconductor, and am trying to accomplish what
>I believe is a simple task. I have 15 AffyMetrix gene signatures
>from 15 subjects, each in the form of a .cel file. The subjects are
>grouped into 3 groups of 5 - let's say Group A, Group B, and Wild
>Type. I would like to compare the average gene expression of
>subjects in Group A to those in Group B, and separately compare the
>average gene expression in Group A to Wild Type. In the results I
>would like the genes most significantly different in expression
>levels and the p-values for each gene comparison.
>
>I have code which I believe does this, but the results do not seem
>totally correct, and I would like some help.
>
>I have two very similar R scripts (to keep things separate and
>simple). One compares Group A to Group B. The other compares Group
>A to Wild Type. The results from comparing Group A to Wild Type
>look correct. However, the results from comparing Group A to Group
>B give an adjusted p value of 0.999992827573282 for every single
>gene. Here are the top two lines from the output file (columns are
>ID, logFC, AveExpr, t, P.Value, adj.P.Val, B) :
>
>42970 1458675_at -0.402322454 4.770279273 -4.522575441
> 0.000900789 0.999992828 -3.268563539
>23121 1438815_at 0.437319401 7.866319701 4.013307606
> 0.002098968 0.999992828 -3.417357338
>
>Obviously something is not right. All the other numbers from the
>Group A vs Group B comparison look reasonable, but this adjusted p
>value is making me doubt the whole thing.
>
>Does someone see a glaring and obvious mistake in my code (which is
>included below)? Is there a better or simpler way to do comparison?
>
>Please let me know if I can provide any additional information. I
>would be happy to provide the excel
>
>Thank you,
>Ramzi
>
>The following code compares Group A with Group B. It is my R
>Script, with notes.
>
>## Load Packages
>library(affy) # Affymetrix pre-processing
>library(limma) # two-color pre-processing; differential expression
>library(Biobase)
>
>## Read targets file.
>pd <-
>read.AnnotatedDataFrame("TargetsAvsB.txt",header=TRUE,row.names=1,as.is=TRUE)
>
>## Read .CEL data.
>rawAffyData <- ReadAffy(filenames=pData(pd)$FileName,phenoData=pd)
>
>## Normalize the data
>eset <- rma(rawAffyData)
>
>## The target file information can be recovered from the eset.
>targets <- pData(eset)
>
>## Define a design matrix.
>#designMatrix <- createDesignMatrix(eset)
>designMatrix <- model.matrix(~ -1 +
>factor(targets$Target,levels=unique(targets$Target)))
>colnames(designMatrix) <- unique(targets$Target)
>numParameters <- ncol(designMatrix)
>parameterNames <- colnames(designMatrix)
>
>## Define a contrasts matrix.
>#contrastsMatrix <- createContrastMatrix(eset, design=designMatrix
>contrastNames <- c(paste(parameterNames[1],parameterNames[2],sep="_vs_"))
>contrastsMatrix <- matrix(c(1,-1),nrow=ncol(designMatrix))
>rownames(contrastsMatrix) <- parameterNames
>colnames(contrastsMatrix) <- contrastNames
>
>## Fit to linear model.
>fit <- lmFit(eset,design=designMatrix)
>
>## Empirical Bayes statistics
>fitNoContrastsMatrix <- eBayes(fit)
>
>## Fit a linear model for the contrasts.
>fit2 <- contrasts.fit(fit,contrasts=contrastsMatrix)
>
>## Empirical Bayes statistics
>fit2 <- eBayes(fit2)
>
>numGenes <- nrow(eset)
>completeTable_A_vs_B <- topTable(fit2,number=numGenes)
>write.table(completeTable_A_vs_B
>,file="A_vs_B_genes.xls",sep="\t",quote=FALSE,col.names=NA)
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at r-project.org
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list