[BioC] remove NA from named character vector
Iain Gallagher
iaingallagher at btopenworld.com
Fri Jul 22 13:03:39 CEST 2011
Hi List
This is likely a trivial problem but it's annoying me. I am mapping from Bos taurus ensembl ids to symbols. I can do this in biomaRt but use of the org.Bt.eg.db package means I'm not tied to an internet connection.
A toy example:
library(org.Bt.eg.db)
ens <- c('ENSBTAG00000004218', 'ENSBTAG00000004270', 'ENSBTAG00000004578', 'ENSBTAG00000004608')
egs <- unlist(mget(ens, revmap(org.Bt.egENSEMBL), ifnotfound=NA))
egs
ENSBTAG00000004218 ENSBTAG00000004270 ENSBTAG00000004578 ENSBTAG00000004608
"617660" "407106" NA "100138951"
# a named character vector with one NA
#now get symbols
syms <- unlist(mget(egs, org.Bt.egSYMBOL, ifnotfound=NA))
#throws and error - fair enough - need to drop the NA
which(egs == NA)
#gives named integer(0) - hmm
class(egs)
#gives [1] "character" - so I'm quite confused now.
NA %in% egs
#gives [1] TRUE
How do I identify which entries in 'egs' are NA so I can remove them? It's trivial here but the dataset I'm working with is in the thousands.
Thanks
iain
> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
[3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
[5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8
[7] LC_PAPER=en_GB.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] org.Bt.eg.db_2.5.0 RSQLite_0.9-4 DBI_0.2-5
[4] AnnotationDbi_1.14.1 Biobase_2.10.0
More information about the Bioconductor
mailing list