[BioC] Bug in GSVA with methods other than default?

Jonathan Manning Jonathan.Manning at ed.ac.uk
Mon Feb 24 11:36:49 CET 2014


Hi Robert,

I confirm the fix worked (though I don't think the new version is in the 
repo yet so I installed from source).

Could there maybe be some more documentation in the gsva() help page on 
bootstrapping and how it doesn't apply to the other methods?

Thanks,

Jon


On 21/02/2014 15:08, Robert Castelo wrote:
> hi Jonathan cc Sonja,
>
> you're right, this is a bug occuring when the input is an 
> 'ExpressionSet' object and the argument 'method' as a value different 
> than 'gsva'.
>
> i've just submitted a fix to both, the devel and the release versions 
> of GSVA, they should become available via biocLite() in the following 
> 24/48 hrs as versions 1.11.4 and 1.10.3, respectively.
>
> in the meantime, you can always check out directly from the 
> corresponding svn repository (current release or devel) the updated 
> source and build it yourself; see
>
> http://master.bioconductor.org/developers/how-to/source-control
>
>
>
> sorry for the inconvenience,
>
> robert.
>
> On 02/21/2014 03:42 PM, Jonathan Manning wrote:
>> Hi Sonja,
>>
>> No, that's not it.
>>
>> The probe IDs map perfectly fine through the annotation package and
>> GSEABase::mapIdentifiers() etc, and everything works fine if I hack
>> around the bug I described previously. Again, what you have is some code
>> (e.g. /'Biobase::exprs(eScoEset) <- eSco$es.obs) /), which assumes that
>> a list is returned by /GSVA:::.gsva()/. In the case of "
>> /method='plage'/ " et al, it is not. If you tweak things with that in
>> mind, there's no problem.
>>
>> Here is the code for gsva() for the ExpressionSet signature (retrieved
>> with /getMethod(gsva, signature=c(expr="ExpressionSet",
>> gset.idx.list="GeneSetCollection", annotation="missing"))/ ), with my
>> hacks included in bold:
>>
>>
>> function (expr, gset.idx.list, annotation = NULL, ...)
>> {
>> .local <- function (expr, gset.idx.list, annotation = NULL,
>> method = c("gsva", "ssgsea", "zscore", "plage"), rnaseq = FALSE,
>> abs.ranking = FALSE, min.sz = 1, max.sz = Inf, no.bootstraps = 0,
>> bootstrap.percent = 0.632, parallel.sz = 0, parallel.type = "SOCK",
>> mx.diff = TRUE, tau = switch(method, gsva = 1, ssgsea = 0.25,
>> NA), kernel = TRUE, verbose = TRUE)
>> {
>> sdGenes <- Biobase::esApply(expr, 1, sd)
>> if (any(sdGenes == 0)) {
>> if (verbose)
>> cat("Filtering out", sum(sdGenes == 0), "genes with constant expression
>> values throuhgout the samples\n")
>> expr <- expr[sdGenes > 0, ]
>> }
>> if (nrow(expr) < 2)
>> stop("Less than two genes in the input ExpressionSet object\n")
>> if (verbose)
>> cat("Mapping identifiers between gene sets and feature names\n")
>> mapped.gset.idx.list <- GSEABase::mapIdentifiers(gset.idx.list,
>> GSEABase::AnnoOrEntrezIdentifier(Biobase::annotation(expr)))
>> tmp <- lapply(geneIds(mapped.gset.idx.list), function(x,
>> y) na.omit(match(x, y)), featureNames(expr))
>> names(tmp) <- names(mapped.gset.idx.list)
>> mapped.gset.idx.list <- filterGeneSets(tmp, min.sz = max(1, min.sz),
>> max.sz = max.sz)
>> eSco <- GSVA:::.gsva(Biobase::exprs(expr), mapped.gset.idx.list,
>> method, rnaseq, abs.ranking, no.bootstraps, bootstrap.percent,
>> parallel.sz, parallel.type, mx.diff, tau, kernel,
>> verbose)
>> eScoEset <- expr
>> #Biobase::exprs(eScoEset) <- eSco$es.obs
>> * if (method=='gsva'){
>> Biobase::exprs(eScoEset) <- eSco$es.obs
>> }else{
>> Biobase::exprs(eScoEset) <- eSco
>> }*
>>
>> Biobase::annotation(eScoEset) <- ""
>> #return(list(es.obs = eScoEset, bootstrap = eSco$bootstrap,
>> # p.vals.sign = eSco$p.vals.sign))
>> *return (eScoEset)*
>> }
>> .local(expr, gset.idx.list, annotation, ...)
>> }
>>
>> Since you say the bootstrapping isn't much use anyway, I've just set it
>> to return the matrix in all cases.
>>
>> Jon
>>
>>
>>
>>
>> On 21/02/2014 14:05, Sonja Haenzelmann wrote:
>>> I think the problem lies in your expression set.
>>> The gene names have to be in ENTREZ gene Id format, as the c2BroadSets
>>> are in this format. gsva maps the genes in your expression set with
>>> the genes in the gene set list. If they do not have the same
>>> identifiers it cannot map the genes, and hence not calculate any
>>> statistics.
>>>
>>>
>>> library(GSVAdata)
>>> data(c2BroadSets)
>>> library(GSVA)
>>>
>>>
>>> example<- as.matrix(read.table("example.txt"))
>>> eset_example<- ExpressionSet(example, annotation='illuminaHumanv4')
>>> pdata_example<- data.frame(class=c(rep('class1', 5), c(rep('class2',
>>> 5))))
>>> rownames(pdata_example)<- letters[1:10]
>>> pData(eset_example)<- pdata_example
>>>
>>>
>>>> featureNames(eset_example)
>>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229"
>>> "ILMN_1806310"
>>> [6] "ILMN_1779670" "ILMN_1653355" "ILMN_1717783" "ILMN_1705025"
>>> "ILMN_1814316"
>>>
>>> you will have to map your ILMN ids to entrez gene ids, then gsva
>>> should be working.
>>>
>>>
>>> For example in the toy example that shows up, when you do ?gsva, you
>>> will find that the gene names are the same in the gene set list as
>>> well as in the gene expression matrix.
>>>
>>> p<- 10 ## number of genes
>>> n<- 30 ## number of samples
>>> nGrp1<- 15 ## number of samples in group 1
>>> nGrp2<- n - nGrp1 ## number of samples in group 2
>>>
>>> ## consider three disjoint gene sets
>>> geneSets<- list(set1=paste("g", 1:3, sep=""),
>>> set2=paste("g", 4:6, sep=""),
>>> set3=paste("g", 7:10, sep=""))
>>>
>>>
>>>
>>>> geneSets
>>> $set1
>>> [1] "g1" "g2" "g3"
>>>
>>> $set2
>>> [1] "g4" "g5" "g6"
>>>
>>> $set3
>>> [1] "g7" "g8" "g9" "g10"
>>>
>>>
>>> ## sample data from a normal distribution with mean 0 and st.dev. 1
>>> y<- matrix(rnorm(n*p), nrow=p, ncol=n,
>>> dimnames=list(paste("g", 1:p, sep="") , paste("s",
>>> 1:n, sep="")))
>>>
>>> rownames(y)
>>> [1] "g1" "g2" "g3" "g4" "g5" "g6" "g7" "g8" "g9" "g10"
>>>
>>> Best
>>> Sonja
>>>
>>>
>>>
>>>
>>> On Fri, Feb 21, 2014 at 2:39 PM, Jonathan Manning
>>> <Jonathan.Manning at ed.ac.uk> wrote:
>>>> example<- as.matrix(read.table("example.txt"))
>>>> eset_example<- ExpressionSet(example, annotation='illuminaHumanv4')
>>>> pdata_example<- data.frame(class=c(rep('class1', 5), c(rep('class2',
>>>> 5))))
>>>> rownames(pdata_example)<- letters[1:10]
>>>> pData(eset_example)<- pdata_example
>>>> gsva_es<- gsva(eset_example, c2BroadSets , mx.diff=1, method='plage')
>>
>>
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



More information about the Bioconductor mailing list