[BioC] Error in lm.fit(design, t(M)) from GEO dataset
James F. Reid
james.reid at ifom-ieo-campus.it
Fri May 28 11:25:23 CEST 2010
Hi Jeremiah,
my guess is that you are taking logs of negative values at some stage in
your pre-processing.
James.
On 05/28/2010 03:39 AM, Jeremiah H. Savage wrote:
> Hello,
>
> I am using limma 3.4.0, and received an error with a GEO series (
> http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12113 ) that
> doesn't occur with other series data:
>
> Error in lm.fit(design, t(M)) :
> NA/NaN/Inf in foreign function call (arg 4)
>
> If I do a summary on the eset, I get some -Inf values. Does this mean
> there is some invalid data in the series that lmFit doesn't handle?
>> summary(exprs(eset12113))
> GSM305488 GSM305489 GSM305490 GSM305491
> Min. :-2.322 Min. :-3.322 Min. :-3.322 Min. : -Inf
> 1st Qu.: 3.263 1st Qu.: 2.982 1st Qu.: 4.585 1st Qu.: 3.868
> Median : 5.202 Median : 4.916 Median : 6.200 Median : 5.638
> Mean : 5.140 Mean : 4.934 Mean : 6.054 Mean : -Inf
> 3rd Qu.: 6.785 3rd Qu.: 6.637 3rd Qu.: 7.485 3rd Qu.: 7.323
> Max. :15.620 Max. :15.180 Max. :14.749 Max. :14.290
>
>
> The code I'm using is:
>> source("sampleEset.R")
>> fit<- getLMFit()
>
>
> sampleEset.R :
> -------------------8<-----------------------------
>
> getLMFit<- function()
> {
> eset12113<- getEset("12113")
>
> design12113<- model.matrix(~ 0+factor(c(1,1,2,3,3,4,5,5,6,7,7,8,9,9,10)))
> colnames(design12113)<-
> c("untreated1","treated1","untreated2","treated2","untreated3","treated3","untreated4","treated4","untreated5","treated5")
> contrast.matrix<- makeContrasts(untreated1-treated1,levels=design12113)
> fit<- lmFit(eset12113,design12113)
> }
>
> getEset<- function(value)
> {
> DATADIR = "/home/jeremiah/R/i486-pc-linux-gnu-library/2.11/GEOquery/extdata"
> data_file = paste("GSE",value,".soft",sep="")
> data_dir_file= paste(DATADIR,data_file,sep="/")
>
> library(GEOquery)
> library(Biobase)
> library(limma)
>
> if (file.exists(data_dir_file))
> {
> GSE_parsed<- getGEO(filename=data_dir_file)
> }
> else
> {
> GSE_value<- paste("GSE",value,sep="")
> GSE_file<- getGEOfile(GSE_value,destdir=DATADIR)
> GSE_parsed<- getGEO(filename=data_dir_file)
> }
> GSM_platform<- lapply(GSMList(GSE_parsed), function(x) {
> Meta(x)$platform
> })
> GSM_platform<- GSM_platform[[1]]
>
> probe_set<- Table(GPLList(GSE_parsed)[[1]])$ID
>
> data.matrix<- do.call("cbind", lapply(GSMList(GSE_parsed), function(x) {
> tab<- Table(x)
> mymatch<- match(probe_set, tab$ID_REF)
> return(tab$VALUE[mymatch])
> }))
>
> data.matrix<- apply(data.matrix, 2, function(x) {
> as.numeric(as.character(x))
> })
>
> dataLog2.matrix<- log2(data.matrix)
>
> rownames(dataLog2.matrix)<- probe_set
> colnames(dataLog2.matrix)<- names(GSMList(GSE_parsed))
> p_data<- data.frame(samples = names(GSMList(GSE_parsed)))
> rownames(p_data)<- names(GSMList(GSE_parsed))
> pheno<- as(p_data, "AnnotatedDataFrame")
> eset<- new("ExpressionSet", exprs = dataLog2.matrix, phenoData = pheno)
> return(eset)
> }
>
> -------------------8<-----------------------------
>
> Thanks,
> Jeremiah
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list