[BioC] Bug in GSVA with methods other than default?
Jonathan Manning
Jonathan.Manning at ed.ac.uk
Fri Feb 21 15:42:18 CET 2014
Hi Sonja,
No, that's not it.
The probe IDs map perfectly fine through the annotation package and
GSEABase::mapIdentifiers() etc, and everything works fine if I hack
around the bug I described previously. Again, what you have is some code
(e.g. /'Biobase::exprs(eScoEset) <- eSco$es.obs) /), which assumes that
a list is returned by /GSVA:::.gsva()/. In the case of "
/method='plage'/ " et al, it is not. If you tweak things with that in
mind, there's no problem.
Here is the code for gsva() for the ExpressionSet signature (retrieved
with /getMethod(gsva, signature=c(expr="ExpressionSet",
gset.idx.list="GeneSetCollection", annotation="missing"))/ ), with my
hacks included in bold:
function (expr, gset.idx.list, annotation = NULL, ...)
{
.local <- function (expr, gset.idx.list, annotation = NULL,
method = c("gsva", "ssgsea", "zscore", "plage"),
rnaseq = FALSE,
abs.ranking = FALSE, min.sz = 1, max.sz = Inf,
no.bootstraps = 0,
bootstrap.percent = 0.632, parallel.sz = 0,
parallel.type = "SOCK",
mx.diff = TRUE, tau = switch(method, gsva = 1,
ssgsea = 0.25,
NA), kernel = TRUE,
verbose = TRUE)
{
sdGenes <- Biobase::esApply(expr, 1, sd)
if (any(sdGenes == 0)) {
if (verbose)
cat("Filtering out", sum(sdGenes == 0), "genes with constant
expression values throuhgout the samples\n")
expr <- expr[sdGenes > 0, ]
}
if (nrow(expr) < 2)
stop("Less than two genes in the input ExpressionSet object\n")
if (verbose)
cat("Mapping identifiers between gene sets and feature names\n")
mapped.gset.idx.list <- GSEABase::mapIdentifiers(gset.idx.list,
GSEABase::AnnoOrEntrezIdentifier(Biobase::annotation(expr)))
tmp <- lapply(geneIds(mapped.gset.idx.list), function(x,
y)
na.omit(match(x, y)), featureNames(expr))
names(tmp) <- names(mapped.gset.idx.list)
mapped.gset.idx.list <- filterGeneSets(tmp, min.sz = max(1,
min.sz), max.sz = max.sz)
eSco <- GSVA:::.gsva(Biobase::exprs(expr), mapped.gset.idx.list,
method, rnaseq, abs.ranking, no.bootstraps,
bootstrap.percent,
parallel.sz, parallel.type, mx.diff, tau, kernel,
verbose)
eScoEset <- expr
#Biobase::exprs(eScoEset) <- eSco$es.obs
* if (method=='gsva'){
Biobase::exprs(eScoEset) <- eSco$es.obs
}else{
Biobase::exprs(eScoEset) <- eSco
}*
Biobase::annotation(eScoEset) <- ""
#return(list(es.obs = eScoEset, bootstrap = eSco$bootstrap,
# p.vals.sign = eSco$p.vals.sign))
*return (eScoEset)*
}
.local(expr, gset.idx.list, annotation, ...)
}
Since you say the bootstrapping isn't much use anyway, I've just set it
to return the matrix in all cases.
Jon
On 21/02/2014 14:05, Sonja Haenzelmann wrote:
> I think the problem lies in your expression set.
> The gene names have to be in ENTREZ gene Id format, as the c2BroadSets
> are in this format. gsva maps the genes in your expression set with
> the genes in the gene set list. If they do not have the same
> identifiers it cannot map the genes, and hence not calculate any
> statistics.
>
>
> library(GSVAdata)
> data(c2BroadSets)
> library(GSVA)
>
>
> example<- as.matrix(read.table("example.txt"))
> eset_example<- ExpressionSet(example, annotation='illuminaHumanv4')
> pdata_example<- data.frame(class=c(rep('class1', 5), c(rep('class2', 5))))
> rownames(pdata_example)<- letters[1:10]
> pData(eset_example)<- pdata_example
>
>
>> featureNames(eset_example)
> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" "ILMN_1806310"
> [6] "ILMN_1779670" "ILMN_1653355" "ILMN_1717783" "ILMN_1705025" "ILMN_1814316"
>
> you will have to map your ILMN ids to entrez gene ids, then gsva
> should be working.
>
>
> For example in the toy example that shows up, when you do ?gsva, you
> will find that the gene names are the same in the gene set list as
> well as in the gene expression matrix.
>
> p<- 10 ## number of genes
> n<- 30 ## number of samples
> nGrp1<- 15 ## number of samples in group 1
> nGrp2<- n - nGrp1 ## number of samples in group 2
>
> ## consider three disjoint gene sets
> geneSets<- list(set1=paste("g", 1:3, sep=""),
> set2=paste("g", 4:6, sep=""),
> set3=paste("g", 7:10, sep=""))
>
>
>
>> geneSets
> $set1
> [1] "g1" "g2" "g3"
>
> $set2
> [1] "g4" "g5" "g6"
>
> $set3
> [1] "g7" "g8" "g9" "g10"
>
>
> ## sample data from a normal distribution with mean 0 and st.dev. 1
> y<- matrix(rnorm(n*p), nrow=p, ncol=n,
> dimnames=list(paste("g", 1:p, sep="") , paste("s",
> 1:n, sep="")))
>
> rownames(y)
> [1] "g1" "g2" "g3" "g4" "g5" "g6" "g7" "g8" "g9" "g10"
>
> Best
> Sonja
>
>
>
>
> On Fri, Feb 21, 2014 at 2:39 PM, Jonathan Manning
> <Jonathan.Manning at ed.ac.uk> wrote:
>> example<- as.matrix(read.table("example.txt"))
>> eset_example<- ExpressionSet(example, annotation='illuminaHumanv4')
>> pdata_example<- data.frame(class=c(rep('class1', 5), c(rep('class2', 5))))
>> rownames(pdata_example)<- letters[1:10]
>> pData(eset_example)<- pdata_example
>> gsva_es<- gsva(eset_example, c2BroadSets , mx.diff=1, method='plage')
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20140221/503d4434/attachment.pl>
More information about the Bioconductor
mailing list