[BioC] Help on Illumina HumanHT12 v4 chromosome filtering

Kemal Akat kakat at mail.rockefeller.edu
Tue Mar 12 17:35:10 CET 2013

Hi all,

I am running into problems when I want to remove probes targeting chromosome Y.

I can reproduce this behavior with the data from the beadarrayExampleData package:

## filter for probe quality
ids = as.character(featureNames(exampleSummaryData))
qual = unlist(mget(ids, illuminaHumanv4PROBEQUALITY, ifnotfound = NA))
rem = qual == "No match" | qual == "Bad" | is.na(qual)
exampleSummaryData_filt = exampleSummaryData[!rem]
## get chromosome location for remaining probes
ids = as.character(featureNames(exampleSummaryData_filt))
chr = unlist(mget(ids, illuminaHumanv4CHR, ifnotfound = NA))
## filter out probes targeting the Y chromosome
rem_chr= chr == "Y"
exampleSummaryData_filt = exampleSummaryData_filt[!rem_chr]
Error in obj[i, , ..., drop = drop] : 
  (subscript) logical subscript too long
Calls: [ ... [ -> callNextMethod -> .nextMethod -> lapply -> FUN

The eSet (Illumina) object starts with

R> dim(exampleSummaryData)
Features  Samples Channels 
   49576       12        2 

probes. After quality filtering

R> dim(exampleSummaryData_filt) ## after filtering for probe quality
Features  Samples Channels 
   30084       12        2 

Now, for the second filtering (ids = as.character(featureNames(exampleSummaryData_filt) etc.):

R> length(ids)
[1] 30084
R> length(chr)
[1] 30109

Where are the extra probes coming from? I tried to match only the ones in exampleSummaryData_filt), 

idx = match(ids, names(chr))
chr = chr[idx]

but this led to other problems.

Maybe I am missing something obvious?
Thank you for any hints or help!


R> sessionInfo()
R Under development (unstable) (2013-02-06 r61857)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] beadarrayExampleData_1.0.5 illuminaHumanv4.db_1.16.0  org.Hs.eg.db_2.9.0         RSQLite_0.11.2            
 [5] DBI_0.2-5                  AnnotationDbi_1.21.13      beadarray_2.9.2            ggplot2_0.9.3             
 [9] Biobase_2.19.3             BiocGenerics_0.5.6         setwidth_1.0-1            

loaded via a namespace (and not attached):
 [1] AnnotationForge_1.1.10 BeadDataPackR_1.11.0   colorspace_1.2-0       dichromat_1.2-4        digest_0.6.0          
 [6] grid_3.0.0             gtable_0.1.2           IRanges_1.17.36        labeling_0.1           limma_3.15.15         
[11] MASS_7.3-23            munsell_0.4            plyr_1.8               proto_0.3-10           RColorBrewer_1.0-5    
[16] reshape2_1.2.2         scales_0.2.3           stats4_3.0.0           stringr_0.6.2          tools_3.0.0           

More information about the Bioconductor mailing list