[BioC] Error in lm.fit(design, t(M)) from GEO dataset
Jeremiah H. Savage
jeremiahsavage at gmail.com
Fri May 28 03:39:18 CEST 2010
Hello,
I am using limma 3.4.0, and received an error with a GEO series (
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12113 ) that
doesn't occur with other series data:
Error in lm.fit(design, t(M)) :
NA/NaN/Inf in foreign function call (arg 4)
If I do a summary on the eset, I get some -Inf values. Does this mean
there is some invalid data in the series that lmFit doesn't handle?
> summary(exprs(eset12113))
GSM305488 GSM305489 GSM305490 GSM305491
Min. :-2.322 Min. :-3.322 Min. :-3.322 Min. : -Inf
1st Qu.: 3.263 1st Qu.: 2.982 1st Qu.: 4.585 1st Qu.: 3.868
Median : 5.202 Median : 4.916 Median : 6.200 Median : 5.638
Mean : 5.140 Mean : 4.934 Mean : 6.054 Mean : -Inf
3rd Qu.: 6.785 3rd Qu.: 6.637 3rd Qu.: 7.485 3rd Qu.: 7.323
Max. :15.620 Max. :15.180 Max. :14.749 Max. :14.290
The code I'm using is:
> source("sampleEset.R")
> fit <- getLMFit()
sampleEset.R :
-------------------8<-----------------------------
getLMFit <- function()
{
eset12113 <- getEset("12113")
design12113 <- model.matrix(~ 0+factor(c(1,1,2,3,3,4,5,5,6,7,7,8,9,9,10)))
colnames(design12113) <-
c("untreated1","treated1","untreated2","treated2","untreated3","treated3","untreated4","treated4","untreated5","treated5")
contrast.matrix <- makeContrasts(untreated1-treated1,levels=design12113)
fit <- lmFit(eset12113,design12113)
}
getEset <- function(value)
{
DATADIR = "/home/jeremiah/R/i486-pc-linux-gnu-library/2.11/GEOquery/extdata"
data_file = paste("GSE",value,".soft",sep="")
data_dir_file= paste(DATADIR,data_file,sep="/")
library(GEOquery)
library(Biobase)
library(limma)
if (file.exists(data_dir_file))
{
GSE_parsed <- getGEO(filename=data_dir_file)
}
else
{
GSE_value <- paste("GSE",value,sep="")
GSE_file <- getGEOfile(GSE_value,destdir=DATADIR)
GSE_parsed <- getGEO(filename=data_dir_file)
}
GSM_platform <- lapply(GSMList(GSE_parsed), function(x) {
Meta(x)$platform
})
GSM_platform <- GSM_platform[[1]]
probe_set <- Table(GPLList(GSE_parsed)[[1]])$ID
data.matrix <- do.call("cbind", lapply(GSMList(GSE_parsed), function(x) {
tab <- Table(x)
mymatch <- match(probe_set, tab$ID_REF)
return(tab$VALUE[mymatch])
}))
data.matrix <- apply(data.matrix, 2, function(x) {
as.numeric(as.character(x))
})
dataLog2.matrix <- log2(data.matrix)
rownames(dataLog2.matrix) <- probe_set
colnames(dataLog2.matrix) <- names(GSMList(GSE_parsed))
p_data <- data.frame(samples = names(GSMList(GSE_parsed)))
rownames(p_data) <- names(GSMList(GSE_parsed))
pheno <- as(p_data, "AnnotatedDataFrame")
eset <- new("ExpressionSet", exprs = dataLog2.matrix, phenoData = pheno)
return(eset)
}
-------------------8<-----------------------------
Thanks,
Jeremiah
More information about the Bioconductor
mailing list