[BioC] GSVA questions

Fri Sep 6 09:46:58 CEST 2013

Dear Joe,

the function gsva() needs to match the identifiers from the 
ExpressionSet object with those from the gene sets. This is done using 
the available Bioconductor infrastructure for this purpose which relies 
on gene-centric annotation packages typically anchored at Entrez Gene 
Identifiers. Notice that the object 'c2BroadSets' has its gene set 
definitions in terms of Entrez identifiers and this facilitates the 
matching operation with ExpressionSet objects.

In principle, this should not be a problem if you download the .gmt 
files from the Broad that contain the gene set definitions in terms of 
Entrez Gene identifiers.

cheers,
robert.

On 09/06/2013 07:46 AM, Joe [guest] wrote:
>
> Dear Markus,
>
> I do it as you said, â€œc2BroadSetsâ€source from package, and "C5allBraodSets" is load from GMT file that download from broadinstitute.
> as:
>> c2BroadSets
> GeneSetCollection
>    names: NAKAMURA_CANCER_MICROENVIRONMENT_UP, NAKAMURA_CANCER_MICROENVIRONMENT_DN, ..., ST_PHOSPHOINOSITIDE_3_KINASE_PATHWAY (3272 total)
>    unique identifiers: 5167, 100288400, ..., 57191 (29340 total)
>    types in collection:
>      geneIdType: EntrezIdentifier (1 total)
>      collectionType: BroadCollection (1 total)
>
>> C5allBroadSets
> GeneSetCollection
>    names: NUCLEOPLASM, EXTRINSIC_TO_PLASMA_MEMBRANE, ..., INOSITOL_OR_PHOSPHATIDYLINOSITOL_KINASE_ACTIVITY (1454 total)
>    unique identifiers: HNRPK, XRCC6, ..., PGM1 (8299 total)
>    types in collection:
>      geneIdType: NullIdentifier (1 total)
>      collectionType: NullCollection (1 total)
>
> when I use "c2BroadSets" GeneSetCollection,  it works, and "NSCLC_norm_GSE32474_rma_Filter" is ExpressionSet, Because of customized CDF, unique gene ID used in the file, so I adjust the min.sz to 1
>> NSCLC_gsva_c2<- gsva(NSCLC_norm_GSE32474_rma_Filter, c2BroadSets,min.sz=1, max.sz=500, verbose=TRUE)$es.obs
>
> when I use "C5allBroadSets", report error, as
>> NSCLC_gsva_c5<- gsva(NSCLC_norm_GSE32474_rma_Filter, C5allBroadSets,min.sz=1, max.sz=500, verbose=TRUE)$es.obs
> Mapping identifiers between gene sets and feature names
> Error in GSVA:::.gsva(Biobase::exprs(expr), mapped.gset.idx.list, method,  :
>    The gene set list is empty!  Filter may be too stringent.
>
> SO, how could I set parameters and make gsva work.................
>
> Thanks,
> Joe
>
>   -- output of sessionInfo():
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
> [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
> [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
> [4] LC_NUMERIC=C
> [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936
>
> attached base packages:
>   [1] splines   grid      parallel  stats     graphics  grDevices utils
>   [8] datasets  methods   base
>
> other attached packages:
>   [1] GSVA_1.8.0                       GSVAdata_0.99.10
>   [3] hgu95a.db_2.9.0                  hgu133plus2hsentrezgprobe_17.1.0
>   [5] hgu133plus2hsentrezgcdf_17.1.0   hgu133plus2hsentrezg.db_17.1.0
>   [7] hgu95av2.db_2.9.0                a4Classif_1.8.0
>   [9] varSelRF_0.7-3                   randomForest_4.6-7
> [11] pamr_1.54.1                      survival_2.37-4
> [13] ROCR_1.0-5                       gplots_2.11.3
> [15] KernSmooth_2.23-10               caTools_1.14
> [17] gdata_2.13.2                     gtools_3.0.0
> [19] MLInterfaces_1.40.0              sfsmisc_1.0-24
> [21] cluster_1.14.4                   rda_1.0.2-2
> [23] rpart_4.1-3                      MASS_7.3-29
> [25] a4Preproc_1.8.0                  a4Core_1.8.0
> [27] glmnet_1.9-5                     Matrix_1.0-12
> [29] lattice_0.20-23                  GSEABase_1.22.0
> [31] affy_1.38.1                      GOstats_2.26.0
> [33] graph_1.38.3                     Category_2.26.0
> [35] VennDiagram_1.6.5                pheatmap_0.7.6
> [37] statmod_1.4.17                   limma_3.16.7
> [39] biomaRt_2.16.0                   annotate_1.38.0
> [41] genefilter_1.42.0                primeviewhsentrezgprobe_17.1.0
> [43] primeviewhsentrezg.db_17.1.0     org.Hs.eg.db_2.9.0
> [45] RSQLite_0.11.4                   DBI_0.2-7
> [47] primeviewhsentrezgcdf_17.1.0     AnnotationDbi_1.22.6
> [49] Biobase_2.20.1                   BiocGenerics_0.6.0
> [51] rj_1.1.3-1
>
> loaded via a namespace (and not attached):
>   [1] affyio_1.28.0         AnnotationForge_1.2.2 BiocInstaller_1.10.3
>   [4] bitops_1.0-6          GO.db_2.9.0           IRanges_1.18.3
>   [7] mboost_2.2-2          preprocessCore_1.22.0 RBGL_1.36.2
> [10] RCurl_1.95-4.1        rj.gd_1.1.3-1         stats4_3.0.1
> [13] tools_3.0.1           XML_3.98-1.1          xtable_1.7-1
> [16] zlibbioc_1.6.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550