[BioC] Bug in GSVA with methods other than default?

Fri Feb 21 15:42:18 CET 2014

Hi Sonja,

No, that's not it.

The probe IDs map perfectly fine through the annotation package and 
GSEABase::mapIdentifiers() etc, and everything works fine if I hack 
around the bug I described previously. Again, what you have is some code 
(e.g. /'Biobase::exprs(eScoEset) <- eSco$es.obs) /), which assumes that 
a list is returned by /GSVA:::.gsva()/. In the case of " 
/method='plage'/ " et al, it is not. If you tweak things with that in 
mind, there's no problem.

Here is the code for gsva() for the ExpressionSet signature (retrieved 
with /getMethod(gsva, signature=c(expr="ExpressionSet", 
gset.idx.list="GeneSetCollection", annotation="missing"))/ ), with my 
hacks included in bold:

function (expr, gset.idx.list, annotation = NULL, ...)
{
   .local <- function (expr, gset.idx.list, annotation = NULL,
                       method = c("gsva", "ssgsea", "zscore", "plage"), 
rnaseq = FALSE,
                       abs.ranking = FALSE, min.sz = 1, max.sz = Inf, 
no.bootstraps = 0,
                       bootstrap.percent = 0.632, parallel.sz = 0, 
parallel.type = "SOCK",
                       mx.diff = TRUE, tau = switch(method, gsva = 1, 
ssgsea = 0.25,
                                                    NA), kernel = TRUE, 
verbose = TRUE)
   {
     sdGenes <- Biobase::esApply(expr, 1, sd)
     if (any(sdGenes == 0)) {
       if (verbose)
         cat("Filtering out", sum(sdGenes == 0), "genes with constant 
expression values throuhgout the samples\n")
       expr <- expr[sdGenes > 0, ]
     }
     if (nrow(expr) < 2)
       stop("Less than two genes in the input ExpressionSet object\n")
     if (verbose)
       cat("Mapping identifiers between gene sets and feature names\n")
     mapped.gset.idx.list <- GSEABase::mapIdentifiers(gset.idx.list,

GSEABase::AnnoOrEntrezIdentifier(Biobase::annotation(expr)))
     tmp <- lapply(geneIds(mapped.gset.idx.list), function(x,
                                                           y) 
na.omit(match(x, y)), featureNames(expr))
     names(tmp) <- names(mapped.gset.idx.list)
     mapped.gset.idx.list <- filterGeneSets(tmp, min.sz = max(1, 
min.sz), max.sz = max.sz)
     eSco <- GSVA:::.gsva(Biobase::exprs(expr), mapped.gset.idx.list,
                          method, rnaseq, abs.ranking, no.bootstraps, 
bootstrap.percent,
                          parallel.sz, parallel.type, mx.diff, tau, kernel,
                          verbose)
     eScoEset <- expr
     #Biobase::exprs(eScoEset) <- eSco$es.obs
*    if (method=='gsva'){
       Biobase::exprs(eScoEset) <- eSco$es.obs
     }else{
       Biobase::exprs(eScoEset) <- eSco
     }*

     Biobase::annotation(eScoEset) <- ""
     #return(list(es.obs = eScoEset, bootstrap = eSco$bootstrap,
      #           p.vals.sign = eSco$p.vals.sign))
*return (eScoEset)*
   }
   .local(expr, gset.idx.list, annotation, ...)
}

Since you say the bootstrapping isn't much use anyway, I've just set it 
to return the matrix in all cases.

Jon

On 21/02/2014 14:05, Sonja Haenzelmann wrote:
> I think the problem lies in your expression set.
> The gene names have to be in ENTREZ gene Id format, as the c2BroadSets
> are in this format. gsva maps the genes in your expression set with
> the genes in the gene set list. If they do not have the same
> identifiers it cannot map the genes, and hence not calculate any
> statistics.
>
>
> library(GSVAdata)
> data(c2BroadSets)
> library(GSVA)
>
>
> example<- as.matrix(read.table("example.txt"))
> eset_example<- ExpressionSet(example, annotation='illuminaHumanv4')
> pdata_example<- data.frame(class=c(rep('class1', 5), c(rep('class2', 5))))
> rownames(pdata_example)<- letters[1:10]
> pData(eset_example)<- pdata_example
>
>
>> featureNames(eset_example)
>   [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" "ILMN_1806310"
>   [6] "ILMN_1779670" "ILMN_1653355" "ILMN_1717783" "ILMN_1705025" "ILMN_1814316"
>
> you will have to map your ILMN ids to entrez gene ids, then gsva
> should be working.
>
>
> For example in the toy example that shows up, when you do ?gsva, you
> will find that the gene names are the same in the gene set list as
> well as in the gene expression matrix.
>
> p<- 10 ## number of genes
>       n<- 30 ## number of samples
>       nGrp1<- 15 ## number of samples in group 1
>       nGrp2<- n - nGrp1 ## number of samples in group 2
>
>       ## consider three disjoint gene sets
>       geneSets<- list(set1=paste("g", 1:3, sep=""),
>                        set2=paste("g", 4:6, sep=""),
>                        set3=paste("g", 7:10, sep=""))
>
>
>
>> geneSets
> $set1
> [1] "g1" "g2" "g3"
>
> $set2
> [1] "g4" "g5" "g6"
>
> $set3
> [1] "g7"  "g8"  "g9"  "g10"
>
>
>       ## sample data from a normal distribution with mean 0 and st.dev. 1
>       y<- matrix(rnorm(n*p), nrow=p, ncol=n,
>                   dimnames=list(paste("g", 1:p, sep="") , paste("s",
> 1:n, sep="")))
>
> rownames(y)
>   [1] "g1"  "g2"  "g3"  "g4"  "g5"  "g6"  "g7"  "g8"  "g9"  "g10"
>
> Best
> Sonja
>
>
>
>
> On Fri, Feb 21, 2014 at 2:39 PM, Jonathan Manning
> <Jonathan.Manning at ed.ac.uk>  wrote:
>> example<- as.matrix(read.table("example.txt"))
>> eset_example<- ExpressionSet(example, annotation='illuminaHumanv4')
>> pdata_example<- data.frame(class=c(rep('class1', 5), c(rep('class2', 5))))
>> rownames(pdata_example)<- letters[1:10]
>> pData(eset_example)<- pdata_example
>> gsva_es<- gsva(eset_example, c2BroadSets , mx.diff=1, method='plage')
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20140221/503d4434/attachment.pl>