[BioC] nsFilter error in genefilter
James W. MacDonald
jmacdon at uw.edu
Wed Apr 17 18:23:49 CEST 2013
Hi Zhenya,
On 4/17/2013 12:02 PM, Zhenya [guest] wrote:
> Hi All,
>
> I am trying to run the code for GSVA (library with the same name). The code is below, but the main error is around annotation:
>> source("http://bioconductor.org/biocLite.R")
> Bioconductor version 2.12 (BiocInstaller 1.10.0), ?biocLite for help
>> biocLite("hthgu133pluspm.db")
There is no such package. You could easily create one yourself using the
AnnotationForge package (see the vignette). Or you could note that the
hthgu133pluspm array has identical content as the hgu133plus2 array,
except for a few extra control probesets, and the fact that they
insisted on adding an extra _PM to all the probesets.
> sum(ls(hgu133plus2cdf) %in% gsub("_PM","", ls(hthgu133pluspmcdf)))
[1] 54675
> length(ls(hgu133plus2cdf))
[1] 54675
> length(ls(hthgu133pluspmcdf))
[1] 54715
> ls(hthgu133pluspmcdf)[!gsub("_PM","", ls(hthgu133pluspmcdf)) %in%
ls(hgu133plus2cdf)]
[1] "AFFX-NonspecificGC10_at" "AFFX-NonspecificGC11_at"
[3] "AFFX-NonspecificGC12_at" "AFFX-NonspecificGC13_at"
[5] "AFFX-NonspecificGC14_at" "AFFX-NonspecificGC15_at"
[7] "AFFX-NonspecificGC16_at" "AFFX-NonspecificGC17_at"
[9] "AFFX-NonspecificGC18_at" "AFFX-NonspecificGC19_at"
[11] "AFFX-NonspecificGC20_at" "AFFX-NonspecificGC21_at"
[13] "AFFX-NonspecificGC22_at" "AFFX-NonspecificGC23_at"
[15] "AFFX-NonspecificGC24_at" "AFFX-NonspecificGC25_at"
[17] "AFFX-NonspecificGC3_at" "AFFX-NonspecificGC4_at"
[19] "AFFX-NonspecificGC5_at" "AFFX-NonspecificGC6_at"
[21] "AFFX-NonspecificGC7_at" "AFFX-NonspecificGC8_at"
[23] "AFFX-NonspecificGC9_at" "AFFX-r2-TagA_at"
[25] "AFFX-r2-TagB_at" "AFFX-r2-TagC_at"
[27] "AFFX-r2-TagD_at" "AFFX-r2-TagE_at"
[29] "AFFX-r2-TagF_at" "AFFX-r2-TagG_at"
[31] "AFFX-r2-TagH_at" "AFFX-r2-TagIN-3_at"
[33] "AFFX-r2-TagIN-5_at" "AFFX-r2-TagIN-M_at"
[35] "AFFX-r2-TagJ-3_at" "AFFX-r2-TagJ-5_at"
[37] "AFFX-r2-TagO-3_at" "AFFX-r2-TagO-5_at"
[39] "AFFX-r2-TagQ-3_at" "AFFX-r2-TagQ-5_at"
So you could either go to the trouble of building and installing a .db
package for this array, or you could do something like
featureNames(EsetData) <- gsub("_PM","", featureNames(EsetData))
annotation(EsetData) <- "hgu133plus2.db"
and carry on as before.
Best,
Jim
> BioC_mirror: http://bioconductor.org
> Using Bioconductor version 2.12 (BiocInstaller 1.10.0), R version 3.0.0.
> Installing package(s) 'hthgu133pluspm.db'
> Warning message:
> package ‘hthgu133pluspm.db’ is not available (for R version 3.0.0)
>
> Code:
>
> # CREATE GeneSetCollection
> library(GSEABase)
> x<- scan("GeneSets.gmt", what="", sep="\n")
> GeneSets.gmt<- strsplit(x, "[[:space:]]+")
> names(GeneSets.gmt)<- sapply(GeneSets.gmt, `[[`, 1)
> GeneSets.gmt<- lapply(GeneSets.gmt, `[`, -1)
> n<- names(GeneSets.gmt)
> uniqueList<- lapply(GeneSets.gmt, unique)
> makeSet<- function(geneIds, n) {GeneSet(geneIds, geneIdType=SymbolIdentifier(), setName=n)}
> gsList<- gsc<- mapply(makeSet, uniqueList[], n)
> gsc<- GeneSetCollection(gsList)
>
> # DATASET
> # CREATE ExpressionSet
> exprs<- as.matrix(read.table("ExprData.txt", header=TRUE, sep="\t", row.names=1, as.is=TRUE))
> pData<- read.table("DesignFile.txt",row.names=1, header=T,sep="\t")
> phenoData<- new("AnnotatedDataFrame",data=pData)
> annotation<- "hthgu133pluspm.db"
> EsetData<- ExpressionSet(assayData=exprs,phenoData=phenoData,annotation="hthgu133pluspm")
> head(ExprData)
>
> #Gene Filtering
> library(genefilter)
> library("hthgu133pluspm")
> filtered_eset<- nsFilter(EsetData, require.entrez=TRUE, remove.dupEntrez=TRUE, var.func=IQR, var.filter=FALSE, var.cutoff=0.25, filterByQuantile=TRUE, feature.exclude="^AFFX")
> # get stats for numbers of probesets removed
> filtered_eset
> EsetData_f<- filtered_eset$eset
>
> # GSVA
> library(GSVA)
> gsva_es<- gsva(EsetData_f,gsc,abs.ranking=FALSE,min.sz=1,max.sz=1000,mx.diff=TRUE)$es.obs
>
> I downloaded hthgu133pluspm from http://nmg-r.bioinformatics.nl/NuGO_R.html
> and R still complains. The available on Bioconductor:
> hthgu133pluspmprobe
> and
> hthgu133pluspmcdf
> are not correct and give error for nsFilter and gsva:
> Error in (function (classes, fdef, mtable) :
> unable to find an inherited method for function ‘cols’ for signature ‘"environment"’
>
> Mapping identifiers between gene sets and feature names
> Error in GeneSetCollection(lapply(what, mapIdentifiers, to, ..., verbose = verbose)) :
> error in evaluating the argument 'object' in selecting a method for function 'GeneSetCollection': Error in (function (classes, fdef, mtable) :
> unable to find an inherited method for function ‘cols’ for signature ‘"environment"’
>
>
> Thank you,
> Zhenya
>
> -- output of sessionInfo():
>
> R version 3.0.0 (2013-04-03)
> Platform: i386-w64-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] GSVA_1.8.0 BiocInstaller_1.10.0 hthgu133pluspmprobe_2.12.0 hthgu133pluspmcdf_2.12.0 genefilter_1.42.0 GSEABase_1.22.0
> [7] graph_1.38.0 annotate_1.38.0 AnnotationDbi_1.22.1 Biobase_2.20.0 BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] DBI_0.2-5 IRanges_1.18.0 RSQLite_0.11.2 splines_3.0.0 stats4_3.0.0 survival_2.37-4 tools_3.0.0 XML_3.96-1.1 xtable_1.7-1
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list