[BioC] Non-specific filtering of Affymetrix Microarray data
Wolfgang Huber
whuber at embl.de
Wed Feb 19 16:54:04 CET 2014
Hi Vinay
a look in the man page of ‘nsFilter’ indicates that its output is a list, one of whose elements is ‘ eset’, the filtered ExpressionSet.
You could try (I haven’t checked) with
selected<-genefilter(celfiles_filtered$est, ff)
But I also wonder why you would want to do this?
DId you explore the ' var.cutoff’, ‘filterByQuantile’ arguments of ‘nsFilter’?
Wolfgang
On 18 Feb 2014, at 05:07, Vinay Randhawa [guest] <guest at bioconductor.org> wrote:
>
> During non-specific filtering, I am using parameters for filtering probes (require.entrez=TRUE, remove.dupEntrez=TRUE,feature.exclude="^AFFX) in addition to the filters of intensity and variance. Independently, both filters works fine, but when I try to use them together, I am getting an error written below:
> Error in apply(expr, 1, flist) : dim(X) must have a positive length
>
>
> Please help me with this.
>
>
> I have pasted the code below.
>
> #1.Getting the data
> source("http://bioconductor.org/biocLite.R")
> biocLite("GEOquery")
> biocLite("affycoretools")
> library(GEOquery)
> setwd("/home/vinay/R/R-3.0.2")
> getGEOSuppFiles("GSE6631")
> setwd("/home/vinay/R/R-3.0.2/GSE6631")
>
> system("tar -xvf GSE6631_RAW.tar")
> cels <- list.files( pattern = "[gz]")
> sapply(cels, gunzip)
>
> #2.Loading and normalising the data using GC-RMA
> # You may need to copy your phenodata.txt file into the GSE6631 folder
> library(affy)
> library(affycoretools)
> data <- ReadAffy()
> pData(data)<-read.table("phenodata.txt", header=T,row.names=1, sep="\t")
> pData(data)
> eset <- gcrma(data)
> eset
> dim(eset)
> pData(eset)
> write.exprs(eset, file="Expression_values_GCRMA_normalize.xls")
> eset2<-eset[,pData(eset)[,"Condition"]%in%c("Normal","Cancer")]
>
>
> #3. Non-specific Filtering data
> library(genefilter)
> celfiles_filtered <- nsFilter(eset2, require.entrez=TRUE, remove.dupEntrez=TRUE,feature.exclude="^AFFX")
> f1<-pOverA(0.10,log2(100)) #intensity filter-the intensity of a gene should be above log2(100) in at least 25 percent of the samples
> f2<-function(x)(IQR(x)>0.5) #variance filter-the interquartile range of log2–intensities should be at least 0.5
> ff<-filterfun(f1,f2)
> selected<-genefilter(celfiles_filtered,ff)
>
>
>
>
>
>
> -- output of sessionInfo():
>
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_IN LC_NUMERIC=C LC_TIME=en_IN
> [4] LC_COLLATE=en_IN LC_MONETARY=en_IN LC_MESSAGES=en_IN
> [7] LC_PAPER=en_IN LC_NAME=C LC_ADDRESS=C
> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] hgu95av2.db_2.10.1 org.Hs.eg.db_2.10.1
> [3] arrayQualityMetrics_3.18.0 affyPLM_1.38.0
> [5] preprocessCore_1.24.0 RColorBrewer_1.0-5
> [7] hgu95av2probe_2.13.0 affycoretools_1.34.0
> [9] KEGG.db_2.10.1 GO.db_2.10.1
> [11] RSQLite_0.11.4 DBI_0.2-7
> [13] limma_3.18.12 hgu95av2cdf_2.13.0
> [15] AnnotationDbi_1.24.0 simpleaffy_2.38.0
> [17] genefilter_1.44.0 gcrma_2.34.0
> [19] affy_1.40.0 GEOquery_2.28.0
> [21] Biobase_2.22.0 BiocGenerics_0.8.0
> [23] BiocInstaller_1.12.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.30.0 annaffy_1.34.0 annotate_1.40.0
> [4] AnnotationForge_1.4.4 beadarray_2.12.0 BeadDataPackR_1.14.0
> [7] biomaRt_2.18.0 Biostrings_2.30.1 biovizBase_1.10.7
> [10] bit_1.1-11 bitops_1.0-6 BSgenome_1.30.0
> [13] Cairo_1.5-5 Category_2.28.0 caTools_1.16
> [16] cluster_1.14.4 codetools_0.2-8 colorspace_1.2-4
> [19] DESeq2_1.2.10 dichromat_2.0-0 digest_0.6.4
> [22] edgeR_3.4.2 ff_2.2-12 foreach_1.4.1
> [25] Formula_1.1-1 gdata_2.13.2 GenomicFeatures_1.14.2
> [28] GenomicRanges_1.14.4 ggbio_1.10.11 ggplot2_0.9.3.1
> [31] GOstats_2.28.0 gplots_2.12.1 graph_1.40.1
> [34] grid_3.0.2 gridExtra_0.9.1 GSEABase_1.24.0
> [37] gtable_0.1.2 gtools_3.3.0 Hmisc_3.14-0
> [40] hwriter_1.3 IRanges_1.20.6 iterators_1.0.6
> [43] KernSmooth_2.23-10 labeling_0.2 lattice_0.20-24
> [46] latticeExtra_0.6-26 locfit_1.5-9.1 MASS_7.3-29
> [49] Matrix_1.1-2 munsell_0.4.2 oligoClasses_1.24.0
> [52] PFAM.db_2.10.1 plyr_1.8 proto_0.3-10
> [55] R2HTML_2.2.1 RBGL_1.38.0 Rcpp_0.11.0
> [58] RcppArmadillo_0.4.000.2 RCurl_1.95-4.1 ReportingTools_2.2.0
> [61] reshape2_1.2.2 R.methodsS3_1.6.1 R.oo_1.17.0
> [64] Rsamtools_1.14.3 rtracklayer_1.22.3 R.utils_1.29.8
> [67] scales_0.2.3 setRNG_2011.11-2 splines_3.0.2
> [70] stats4_3.0.2 stringr_0.6.2 survival_2.37-7
> [73] SVGAnnotation_0.93-1 tcltk_3.0.2 tools_3.0.2
> [76] VariantAnnotation_1.8.12 vsn_3.30.0 XML_3.98-1.1
> [79] xtable_1.7-1 XVector_0.2.0 zlibbioc_1.8.0
>>
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list