[BioC] [GSVA] Segfault with GSVA
Robert Castelo
robert.castelo at upf.edu
Wed Jan 16 16:25:38 CET 2013
hi Luca, and everybody else,
the original post was not published at the list but i'm posting it below
with the solution to the problem.
the problem that is causing the segfault is, next to a bad handling of
the error, the fact that some genes have a constant expression value
across all samples, leading to no variability:
dim(matrix)
[1] 999 10
sdsgenes <- apply(matrix, 1, sd)
summary(sdsgenes)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00000 0.00000 0.01776 0.10120 0.10220 2.83900
sum(sdsgenes == 0)
[1] 320
as you can see, 320 out of the 999 have standard deviation zero, if you
do the calculations filtering those genes out, things will work:
x <- gsva(expr=matrix[sdsgenes > 0, ], gset.idx.list=idx.list)
Estimating GSVA scores for 152 gene sets.
Computing observed enrichment scores
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 12 cores
|===============================================================================================|
100%
dim(x$es.obs)
[1] 152 10
i have added a QC that removes upfront all genes with zero standard
deviation, reporting this action when the argument verbose=TRUE.
this has been fixed in both GSVA release version 1.6.2 and development
version 1.7.5 which should be available at the BioC site within the next
48 hrs.
thanks for the bug reporting!!
robert.
On 01/16/2013 11:49 AM, Luca Beltrame wrote:
> Hello,
>
> I'm a bioinformatician working in a no profit research institute, and recently
> I've been using GSVA in a number of pipelines. However, with some data sets
> (in particular Affymetrix arrays) I encounter a segfault in the C code
> outlined below.
>
> I attached to this mail a section of a matrix that reproduces the problem:
> mouse data, normalized with GCRMA and annotated using M.Dai's custom CDFs.
>
> Thanks in advance.
>
> Test case below:
>
>> library(GSVA)
>> library(GSVAdata)
>> data(c2BroadSets)
>> matrix = as.matrix(read.delim("matrix.txt", row.names=1)) # Mouse data
>> idx.list = geneIds(c2BroadSets) # convert to regular list
>> gsva(expr=matrix, gset.idx.list=idx.list)
> Estimating GSVA scores for 269 gene sets.
> Computing observed enrichment scores
> Estimating ECDFs in microarray data with Gaussian kernels
>
> *** caught segfault ***
> address 0x7fa467060140, cause 'memory not mapped'
>
> Traceback:
> 1: .C("matrix_density_R", as.double(t(expr[, sample.idxs])),
> as.double(t(expr)), R = double(n.test.samples * n.genes),
> n.density.samples, n.test.samples, n.genes, as.integer(rnaseq))
> 2: compute.gene.density(expr, sample.idxs, rnaseq, kernel)
> 3: compute.geneset.es(expr, gset.idx.list, 1:n.samples, rnaseq = rnaseq,
> abs.ranking = abs.ranking, parallel.sz = parallel.sz, parallel.type =
> parallel.type, mx.diff = mx.diff, tau = tau, kernel = kernel, verbose =
> verbose)
> 4: GSVA:::.gsva(expr, mapped.gset.idx.list, method, rnaseq, abs.ranking,
> no.bootstraps, bootstrap.percent, parallel.sz, parallel.type, mx.diff,
> tau, kernel, verbose)
> 5: .local(expr, gset.idx.list, annotation, ...)
> 6: gsva(expr = matr, gset.idx.list = geneIds(c2BroadSets))
> 7: gsva(expr = matr, gset.idx.list = geneIds(c2BroadSets))
>
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-suse-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=it_IT.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=it_IT.UTF-8 LC_COLLATE=it_IT.UTF-8
> [5] LC_MONETARY=it_IT.UTF-8 LC_MESSAGES=it_IT.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] GSVA_1.6.1 GSEABase_1.20.1 graph_1.36.1
> [4] annotate_1.36.0 AnnotationDbi_1.20.3 Biobase_2.18.0
> [7] BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
> [1] DBI_0.2-5 IRanges_1.16.4 parallel_2.15.2 RSQLite_0.11.2
> [5] stats4_2.15.2 tools_2.15.2 XML_3.95-0.1 xtable_1.7-1
>
>
--
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550
More information about the Bioconductor
mailing list