[BioC] Non-specific filtering of Affymetrix Microarray data

Vinay Randhawa [guest] guest at bioconductor.org
Tue Feb 18 05:07:28 CET 2014


During non-specific filtering, I am using parameters for filtering probes (require.entrez=TRUE, remove.dupEntrez=TRUE,feature.exclude="^AFFX) in addition to the filters of intensity and variance. Independently, both filters works fine, but when I try to use them together, I am getting an error written below:
Error in apply(expr, 1, flist) : dim(X) must have a positive length


Please help me with this.


I have pasted the code below.

#1.Getting the data
source("http://bioconductor.org/biocLite.R")
biocLite("GEOquery")
biocLite("affycoretools")
library(GEOquery)
setwd("/home/vinay/R/R-3.0.2")
getGEOSuppFiles("GSE6631")
setwd("/home/vinay/R/R-3.0.2/GSE6631")

system("tar -xvf GSE6631_RAW.tar")
cels <- list.files( pattern = "[gz]")
sapply(cels, gunzip)

#2.Loading and normalising the data using GC-RMA
# You may need to copy your phenodata.txt file into the GSE6631 folder 
library(affy)
library(affycoretools)
data <- ReadAffy()
pData(data)<-read.table("phenodata.txt", header=T,row.names=1, sep="\t")
pData(data)
eset <- gcrma(data)
eset
dim(eset)
pData(eset)
write.exprs(eset, file="Expression_values_GCRMA_normalize.xls")
eset2<-eset[,pData(eset)[,"Condition"]%in%c("Normal","Cancer")] 


#3. Non-specific Filtering data
library(genefilter)
celfiles_filtered <- nsFilter(eset2, require.entrez=TRUE, remove.dupEntrez=TRUE,feature.exclude="^AFFX")
f1<-pOverA(0.10,log2(100))  #intensity filter-the intensity of a gene should be above log2(100) in at least 25 percent of the samples
f2<-function(x)(IQR(x)>0.5)  #variance filter-the interquartile range of log2–intensities should be at least 0.5
ff<-filterfun(f1,f2)
selected<-genefilter(celfiles_filtered,ff)






 -- output of sessionInfo(): 

R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_IN       LC_NUMERIC=C         LC_TIME=en_IN       
 [4] LC_COLLATE=en_IN     LC_MONETARY=en_IN    LC_MESSAGES=en_IN   
 [7] LC_PAPER=en_IN       LC_NAME=C            LC_ADDRESS=C        
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C 

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] hgu95av2.db_2.10.1         org.Hs.eg.db_2.10.1       
 [3] arrayQualityMetrics_3.18.0 affyPLM_1.38.0            
 [5] preprocessCore_1.24.0      RColorBrewer_1.0-5        
 [7] hgu95av2probe_2.13.0       affycoretools_1.34.0      
 [9] KEGG.db_2.10.1             GO.db_2.10.1              
[11] RSQLite_0.11.4             DBI_0.2-7                 
[13] limma_3.18.12              hgu95av2cdf_2.13.0        
[15] AnnotationDbi_1.24.0       simpleaffy_2.38.0         
[17] genefilter_1.44.0          gcrma_2.34.0              
[19] affy_1.40.0                GEOquery_2.28.0           
[21] Biobase_2.22.0             BiocGenerics_0.8.0        
[23] BiocInstaller_1.12.0      

loaded via a namespace (and not attached):
 [1] affyio_1.30.0            annaffy_1.34.0           annotate_1.40.0         
 [4] AnnotationForge_1.4.4    beadarray_2.12.0         BeadDataPackR_1.14.0    
 [7] biomaRt_2.18.0           Biostrings_2.30.1        biovizBase_1.10.7       
[10] bit_1.1-11               bitops_1.0-6             BSgenome_1.30.0         
[13] Cairo_1.5-5              Category_2.28.0          caTools_1.16            
[16] cluster_1.14.4           codetools_0.2-8          colorspace_1.2-4        
[19] DESeq2_1.2.10            dichromat_2.0-0          digest_0.6.4            
[22] edgeR_3.4.2              ff_2.2-12                foreach_1.4.1           
[25] Formula_1.1-1            gdata_2.13.2             GenomicFeatures_1.14.2  
[28] GenomicRanges_1.14.4     ggbio_1.10.11            ggplot2_0.9.3.1         
[31] GOstats_2.28.0           gplots_2.12.1            graph_1.40.1            
[34] grid_3.0.2               gridExtra_0.9.1          GSEABase_1.24.0         
[37] gtable_0.1.2             gtools_3.3.0             Hmisc_3.14-0            
[40] hwriter_1.3              IRanges_1.20.6           iterators_1.0.6         
[43] KernSmooth_2.23-10       labeling_0.2             lattice_0.20-24         
[46] latticeExtra_0.6-26      locfit_1.5-9.1           MASS_7.3-29             
[49] Matrix_1.1-2             munsell_0.4.2            oligoClasses_1.24.0     
[52] PFAM.db_2.10.1           plyr_1.8                 proto_0.3-10            
[55] R2HTML_2.2.1             RBGL_1.38.0              Rcpp_0.11.0             
[58] RcppArmadillo_0.4.000.2  RCurl_1.95-4.1           ReportingTools_2.2.0    
[61] reshape2_1.2.2           R.methodsS3_1.6.1        R.oo_1.17.0             
[64] Rsamtools_1.14.3         rtracklayer_1.22.3       R.utils_1.29.8          
[67] scales_0.2.3             setRNG_2011.11-2         splines_3.0.2           
[70] stats4_3.0.2             stringr_0.6.2            survival_2.37-7         
[73] SVGAnnotation_0.93-1     tcltk_3.0.2              tools_3.0.2             
[76] VariantAnnotation_1.8.12 vsn_3.30.0               XML_3.98-1.1            
[79] xtable_1.7-1             XVector_0.2.0            zlibbioc_1.8.0          
> 


--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list