[BioC] Comparing Two Affymetrix Arrays Question
Abboud, Ramzi
Ramzi_Abboud at URMC.Rochester.edu
Thu Jun 23 22:30:08 CEST 2011
Hello All,
I am somewhat new to bioconductor, and am trying to accomplish what I believe is a simple task. I have 15 AffyMetrix gene signatures from 15 subjects, each in the form of a .cel file. The subjects are grouped into 3 groups of 5 - let's say Group A, Group B, and Wild Type. I would like to compare the average gene expression of subjects in Group A to those in Group B, and separately compare the average gene expression in Group A to Wild Type. In the results I would like the genes most significantly different in expression levels and the p-values for each gene comparison.
I have code which I believe does this, but the results do not seem totally correct, and I would like some help.
I have two very similar R scripts (to keep things separate and simple). One compares Group A to Group B. The other compares Group A to Wild Type. The results from comparing Group A to Wild Type look correct. However, the results from comparing Group A to Group B give an adjusted p value of 0.999992827573282 for every single gene. Here are the top two lines from the output file (columns are ID, logFC, AveExpr, t, P.Value, adj.P.Val, B) :
42970 1458675_at -0.402322454 4.770279273 -4.522575441 0.000900789 0.999992828 -3.268563539
23121 1438815_at 0.437319401 7.866319701 4.013307606 0.002098968 0.999992828 -3.417357338
Obviously something is not right. All the other numbers from the Group A vs Group B comparison look reasonable, but this adjusted p value is making me doubt the whole thing.
Does someone see a glaring and obvious mistake in my code (which is included below)? Is there a better or simpler way to do comparison?
Please let me know if I can provide any additional information. I would be happy to provide the excel
Thank you,
Ramzi
The following code compares Group A with Group B. It is my R Script, with notes.
## Load Packages
library(affy) # Affymetrix pre-processing
library(limma) # two-color pre-processing; differential expression
library(Biobase)
## Read targets file.
pd <- read.AnnotatedDataFrame("TargetsAvsB.txt",header=TRUE,row.names=1,as.is=TRUE)
## Read .CEL data.
rawAffyData <- ReadAffy(filenames=pData(pd)$FileName,phenoData=pd)
## Normalize the data
eset <- rma(rawAffyData)
## The target file information can be recovered from the eset.
targets <- pData(eset)
## Define a design matrix.
#designMatrix <- createDesignMatrix(eset)
designMatrix <- model.matrix(~ -1 + factor(targets$Target,levels=unique(targets$Target)))
colnames(designMatrix) <- unique(targets$Target)
numParameters <- ncol(designMatrix)
parameterNames <- colnames(designMatrix)
## Define a contrasts matrix.
#contrastsMatrix <- createContrastMatrix(eset, design=designMatrix
contrastNames <- c(paste(parameterNames[1],parameterNames[2],sep="_vs_"))
contrastsMatrix <- matrix(c(1,-1),nrow=ncol(designMatrix))
rownames(contrastsMatrix) <- parameterNames
colnames(contrastsMatrix) <- contrastNames
## Fit to linear model.
fit <- lmFit(eset,design=designMatrix)
## Empirical Bayes statistics
fitNoContrastsMatrix <- eBayes(fit)
## Fit a linear model for the contrasts.
fit2 <- contrasts.fit(fit,contrasts=contrastsMatrix)
## Empirical Bayes statistics
fit2 <- eBayes(fit2)
numGenes <- nrow(eset)
completeTable_A_vs_B <- topTable(fit2,number=numGenes)
write.table(completeTable_A_vs_B ,file="A_vs_B_genes.xls",sep="\t",quote=FALSE,col.names=NA)
More information about the Bioconductor
mailing list