[BioC] Using limma to identify differentially expressed genes
David Westergaard
david at harsk.dk
Tue Apr 10 17:14:18 CEST 2012
Hello,
I've been trying to use limma to identify genes from the following
data: http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-21340 -
It's a simple control vs. disease experiment
# SDRF downloaded from above page
SDRF <- read.table(file="E-GEOD-21340.sdrf.txt",header=TRUE,stringsAsFactors=FALSE,sep="\t")
# Looking to compare Family-history negativer versus Diabetis
controls <- SDRF[grep("Control, Family History
Neg",SDRF$Comment..Sample_source_name.),]
disease <- SDRF[grep("^DM",SDRF$Characteristics.DiseaseState.),]
Batch <- rbind(controls,disease)
# Read in CEL files
mixture.batch <- ReadAffy(filenames=Batch$Array.Data.File)
# Preprocess data
mixture.processed <- expresso(mixture.batch, bgcorrect.method = "rma",
normalize.method = "quantiles", pmcorrect.method = "pmonly",
summary.method = "medianpolish")
# Get data in matrix
signals <- exprs(mixture.prepared)
cl <- ifelse(colnames(signals) %in% disease$Array.Data.File,1,0)
# Do design matrix and fit
design <- model.matrix(~factor(cl))
fit <- lmFit(signals,design)
fit <- eBayes(fit)
topTable(fit2,coef=2)
Which yields the following:
ID logFC AveExpr t P.Value adj.P.Val B
7513 208004_at -0.323 5.43 -4.65 0.00191 0.999 -3.10
11225 211829_s_at 0.340 5.07 4.36 0.00278 0.999 -3.17
5950 206424_at -0.907 6.65 -4.15 0.00363 0.999 -3.23
1354 201826_s_at -0.447 8.37 -4.13 0.00374 0.999 -3.24
19782 220418_at 0.392 5.43 4.02 0.00431 0.999 -3.27
8889 209396_s_at 1.899 7.47 4.01 0.00437 0.999 -3.28
5005 205478_at -0.931 9.22 -3.94 0.00481 0.999 -3.30
9469 209983_s_at 0.412 5.72 3.92 0.00492 0.999 -3.31
2936 203409_at 0.506 6.93 3.87 0.00531 0.999 -3.32
5054 205527_s_at 0.331 6.80 3.84 0.00549 0.999 -3.33
I'm abit puzzled over the adjusted P-values. Can it really be true
that ALL of the adjusted P-values are 0.999, or did I make a rookie
mistake somewhere?
Best Regards,
David Westergaard
More information about the Bioconductor
mailing list