[BioC] nsFilter error in genefilter

Mon Jul 8 17:00:19 CEST 2013

Hi Steven,

On 7/5/2013 11:11 AM, steven wink wrote:
> Dear Jim,
>
> I am facing the same problem, and your idea would be great for me but 
> I ran into a problem: cannot change featureNames of Affybatch.
>
> > my.data
> my.data
> AffyBatch object
> size of arrays=744x744 features (23 kb)
> cdf=HT_HG-U133_Plus_PM (54715 affyids)
> number of samples=16
> number of genes=54715
> annotation=hthgu133pluspm
> notes=
>
> > featureNames(my.data) <- gsub("_PM","", featureNames(my.data))
>
> Error in `featureNames<-`(`*tmp*`, value = c("1007_s_at", "1053_at", 
> "117_at",  :
> *Cannot change featureNames of AffyBatch*
>
> I tried running R as super user but same result.
> I also want to replace the default cdf by a brainarray cdf after this 
> step.
>
> ps. I can perform vsnrma(), but e.g. nsFilter apparently needs the 
> annotation file so I have to switch to the plus2 or make the 
> "hthgu133pluspm.db" package (which I never tried before)
> Do you have any suggestions on the *Cannot change featureNames of 
> AffyBatch?*

You could use my original suggestion, which was to change the 
featureNames of the ExpressionSet object that you get after summarizing. 
You are trying to change the featureNames on the AffyBatch object, prior 
to summarizing, which is not what I suggested.

Best,

Jim

>
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               
> LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    
> LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C             
> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  
> methods   base
>
> other attached packages:
>  [1] hthgu133pluspmcdf_2.12.0          
> genefilter_1.42.0                 vsn_3.28.0
>  [4] arrayQualityMetrics_3.16.0        
> hgu133plus2.db_2.9.0              org.Hs.eg.db_2.9.0
>  [7] RSQLite_0.11.4                    
> DBI_0.2-7                         hthgu133pluspmhsentrezgcdf_17.1.0
> [10] AnnotationDbi_1.22.5              
> limma_3.16.5                      affy_1.38.1
> [13] Biobase_2.20.0                    BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
>  [1] affyio_1.28.0         affyPLM_1.36.0        annotate_1.38.0       
> beadarray_2.10.0      BeadDataPackR_1.12.0  BiocInstaller_1.10.2
>  [7] Biostrings_2.28.0     Cairo_1.5-2           cluster_1.14.4        
> colorspace_1.2-2      gcrma_2.32.0          grid_3.0.1
> [13] Hmisc_3.10-1.1        hwriter_1.3           IRanges_1.18.1        
> lattice_0.20-15       latticeExtra_0.6-24   plyr_1.8
> [19] preprocessCore_1.22.0 RColorBrewer_1.0-5    reshape2_1.2.2        
> setRNG_2011.11-2      splines_3.0.1         stats4_3.0.1
> [25] stringr_0.6.2         survival_2.37-4       SVGAnnotation_0.93-1  
> tools_3.0.1           XML_3.98-1.1          xtable_1.7-1
> [31] zlibbioc_1.6.0
>
>
> 2013/4/17 James W. MacDonald <jmacdon at uw.edu <mailto:jmacdon at uw.edu>>
>
>     Hi Zhenya,
>
>
>     On 4/17/2013 12:02 PM, Zhenya [guest] wrote:
>
>         Hi All,
>
>         I am trying to run the code for GSVA (library with the same
>         name). The code is below, but the main error is around annotation:
>
>             source("http://bioconductor.org/biocLite.R")
>
>         Bioconductor version 2.12 (BiocInstaller 1.10.0), ?biocLite
>         for help
>
>             biocLite("hthgu133pluspm.db")
>
>
>     There is no such package. You could easily create one yourself
>     using the AnnotationForge package (see the vignette). Or you could
>     note that the hthgu133pluspm array has identical content as the
>     hgu133plus2 array, except for a few extra control probesets, and
>     the fact that they insisted on adding an extra _PM to all the
>     probesets.
>
>     > sum(ls(hgu133plus2cdf) %in% gsub("_PM","", ls(hthgu133pluspmcdf)))
>     [1] 54675
>     > length(ls(hgu133plus2cdf))
>     [1] 54675
>     > length(ls(hthgu133pluspmcdf))
>     [1] 54715
>     > ls(hthgu133pluspmcdf)[!gsub("_PM","", ls(hthgu133pluspmcdf))
>     %in% ls(hgu133plus2cdf)]
>     [1] "AFFX-NonspecificGC10_at" "AFFX-NonspecificGC11_at"
>     [3] "AFFX-NonspecificGC12_at" "AFFX-NonspecificGC13_at"
>     [5] "AFFX-NonspecificGC14_at" "AFFX-NonspecificGC15_at"
>     [7] "AFFX-NonspecificGC16_at" "AFFX-NonspecificGC17_at"
>     [9] "AFFX-NonspecificGC18_at" "AFFX-NonspecificGC19_at"
>     [11] "AFFX-NonspecificGC20_at" "AFFX-NonspecificGC21_at"
>     [13] "AFFX-NonspecificGC22_at" "AFFX-NonspecificGC23_at"
>     [15] "AFFX-NonspecificGC24_at" "AFFX-NonspecificGC25_at"
>     [17] "AFFX-NonspecificGC3_at" "AFFX-NonspecificGC4_at"
>     [19] "AFFX-NonspecificGC5_at" "AFFX-NonspecificGC6_at"
>     [21] "AFFX-NonspecificGC7_at" "AFFX-NonspecificGC8_at"
>     [23] "AFFX-NonspecificGC9_at" "AFFX-r2-TagA_at"
>     [25] "AFFX-r2-TagB_at" "AFFX-r2-TagC_at"
>     [27] "AFFX-r2-TagD_at" "AFFX-r2-TagE_at"
>     [29] "AFFX-r2-TagF_at" "AFFX-r2-TagG_at"
>     [31] "AFFX-r2-TagH_at" "AFFX-r2-TagIN-3_at"
>     [33] "AFFX-r2-TagIN-5_at" "AFFX-r2-TagIN-M_at"
>     [35] "AFFX-r2-TagJ-3_at" "AFFX-r2-TagJ-5_at"
>     [37] "AFFX-r2-TagO-3_at" "AFFX-r2-TagO-5_at"
>     [39] "AFFX-r2-TagQ-3_at" "AFFX-r2-TagQ-5_at"
>
>     So you could either go to the trouble of building and installing a
>     .db package for this array, or you could do something like
>
>     featureNames(EsetData) <- gsub("_PM","", featureNames(EsetData))
>     annotation(EsetData) <- "hgu133plus2.db"
>
>     and carry on as before.
>
>     Best,
>
>     Jim
>
>
>
>         BioC_mirror: http://bioconductor.org
>         Using Bioconductor version 2.12 (BiocInstaller 1.10.0), R
>         version 3.0.0.
>         Installing package(s) 'hthgu133pluspm.db'
>         Warning message:
>         package â€˜hthgu133pluspm.dbâ€™ is not available (for R
>         version 3.0.0)
>
>         Code:
>
>         # CREATE GeneSetCollection
>         library(GSEABase)
>         x<- scan("GeneSets.gmt", what="", sep="\n")
>         GeneSets.gmt<- strsplit(x, "[[:space:]]+")
>         names(GeneSets.gmt)<- sapply(GeneSets.gmt, `[[`, 1)
>         GeneSets.gmt<- lapply(GeneSets.gmt, `[`, -1)
>         n<- names(GeneSets.gmt)
>         uniqueList<- lapply(GeneSets.gmt, unique)
>         makeSet<- function(geneIds, n) {GeneSet(geneIds,
>         geneIdType=SymbolIdentifier(), setName=n)}
>         gsList<- gsc<- mapply(makeSet, uniqueList[], n)
>         gsc<- GeneSetCollection(gsList)
>
>         # DATASET
>         # CREATE ExpressionSet
>         exprs<- as.matrix(read.table("ExprData.txt", header=TRUE,
>         sep="\t", row.names=1, as.is <http://as.is>=TRUE))
>         pData<- read.table("DesignFile.txt",row.names=1,
>         header=T,sep="\t")
>         phenoData<- new("AnnotatedDataFrame",data=pData)
>         annotation<- "hthgu133pluspm.db"
>         EsetData<-
>         ExpressionSet(assayData=exprs,phenoData=phenoData,annotation="hthgu133pluspm")
>         head(ExprData)
>
>         #Gene Filtering
>         library(genefilter)
>         library("hthgu133pluspm")
>         filtered_eset<- nsFilter(EsetData, require.entrez=TRUE,
>         remove.dupEntrez=TRUE, var.func=IQR, var.filter=FALSE,
>         var.cutoff=0.25, filterByQuantile=TRUE, feature.exclude="^AFFX")
>         # get stats for numbers of probesets removed
>         filtered_eset
>         EsetData_f<- filtered_eset$eset
>
>         # GSVA
>         library(GSVA)
>         gsva_es<- gsva(EsetData_f,gsc,abs.ranking=FALSE,min.sz
>         <http://min.sz>=1,max.sz <http://max.sz>=1000,mx.diff=TRUE)$es.obs
>
>         I downloaded hthgu133pluspm from
>         http://nmg-r.bioinformatics.nl/NuGO_R.html
>         and R still complains. The available on Bioconductor:
>         hthgu133pluspmprobe
>         and
>         hthgu133pluspmcdf
>         are not correct and give error for nsFilter and gsva:
>         Error in (function (classes, fdef, mtable)  :
>            unable to find an inherited method for function â€˜colsâ€™
>         for signature â€˜"environment"â€™
>
>         Mapping identifiers between gene sets and feature names
>         Error in GeneSetCollection(lapply(what, mapIdentifiers, to,
>         ..., verbose = verbose)) :
>            error in evaluating the argument 'object' in selecting a
>         method for function 'GeneSetCollection': Error in (function
>         (classes, fdef, mtable)  :
>            unable to find an inherited method for function â€˜colsâ€™
>         for signature â€˜"environment"â€™
>
>
>         Thank you,
>         Zhenya
>
>           -- output of sessionInfo():
>
>         R version 3.0.0 (2013-04-03)
>         Platform: i386-w64-mingw32/i386 (32-bit)
>
>         locale:
>         [1] LC_COLLATE=English_United States.1252
>          LC_CTYPE=English_United States.1252  
>          LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>         [5] LC_TIME=English_United States.1252
>
>         attached base packages:
>         [1] parallel  stats     graphics  grDevices utils     datasets
>          methods   base
>
>         other attached packages:
>           [1] GSVA_1.8.0                 BiocInstaller_1.10.0      
>         hthgu133pluspmprobe_2.12.0 hthgu133pluspmcdf_2.12.0  
>         genefilter_1.42.0          GSEABase_1.22.0
>           [7] graph_1.38.0               annotate_1.38.0          
>          AnnotationDbi_1.22.1       Biobase_2.20.0            
>         BiocGenerics_0.6.0
>
>         loaded via a namespace (and not attached):
>         [1] DBI_0.2-5       IRanges_1.18.0  RSQLite_0.11.2
>          splines_3.0.0   stats4_3.0.0    survival_2.37-4 tools_3.0.0  
>           XML_3.96-1.1    xtable_1.7-1
>
>
>         --
>         Sent via the guest posting facility at bioconductor.org
>         <http://bioconductor.org>.
>
>         _______________________________________________
>         Bioconductor mailing list
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>         https://stat.ethz.ch/mailman/listinfo/bioconductor
>         Search the archives:
>         http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>     -- 
>     James W. MacDonald, M.S.
>     Biostatistician
>     University of Washington
>     Environmental and Occupational Health Sciences
>     4225 Roosevelt Way NE, # 100
>     Seattle WA 98105-6099
>
>
>     _______________________________________________
>     Bioconductor mailing list
>     Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>     https://stat.ethz.ch/mailman/listinfo/bioconductor
>     Search the archives:
>     http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099