[BioC] Finding coding SNPs with predictCoding
Alex Gutteridge
alexg at ruggedtextile.com
Wed Feb 22 12:56:37 CET 2012
Hi,
I'm trying to find coding SNPs for a list of genes. I'm trying to use
predictCoding() from VariantAnnotation in the devel version, but I'm
getting an error I can't get to the bottom of. Here is the script:
library(BSgenome.Hsapiens.UCSC.hg19)
library(VariantAnnotation)
library(SNPlocs.Hsapiens.dbSNP.20110815)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
#List of genes to retrieve coding snps for
entrez.ids = c('6233')
#Transcriptdb
txdb19 = TxDb.Hsapiens.UCSC.hg19.knownGene
#SNPs - renamed seq levels to be compatible with txdb
snps = getSNPlocs(c("ch1","ch2"),as.GRanges=T)
renamed.levels = gsub("ch","chr",seqlevels(snps))
names(renamed.levels) = seqlevels(snps)
snps = renameSeqlevels(snps,renamed.levels)
#CDS grouped by gene
gene.list = cdsBy(txdb19, by="gene")
vsd.list = gene.list[entrez.ids]
#For each gene
for(eg in entrez.ids){
#Find all SNPs
snp.idx = queryHits(findOverlaps(snps, vsd.list[eg][[1]]))
eg.snps = snps[snp.idx]
#Find variant alleles as iupac
iupac = values(eg.snps)[,"alleles_as_ambig"]
#Replicate snps according to number of variant alleles
eg.snps.exp = rep(eg.snps, nchar(IUPAC_CODE_MAP[iupac]))
#Create list of all variants
variant.alleles =
DNAStringSet(strsplit(paste(IUPAC_CODE_MAP[iupac],collapse=""),"")[[1]])
#Find coding SNPs
aa =
predictCoding(eg.snps.exp,txdb19,seqSource=Hsapiens,varAllele=variant.alleles)
}
########
But the predictCoding call gives this error:
Error in .setSeqNames(x, value) :
The replacement value for isActiveSeq must be a logical vector, with
names that match the seqlevels of the object
As you can see I had to redo the seqlevels for the SNPs because they
were ch1, ch2, etc... rather than chr1, chr2 etc... Have I done that
correctly or introduced some inconsistency there?
######
> sessionInfo()
R Under development (unstable) (2012-02-19 r58418)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.6.4
[2] GenomicFeatures_1.7.25
[3] SNPlocs.Hsapiens.dbSNP.20110815_0.99.6
[4] VariantAnnotation_1.1.51
[5] Rsamtools_1.7.35
[6] AnnotationDbi_1.17.21
[7] Biobase_2.15.3
[8] BSgenome.Hsapiens.UCSC.hg19_1.3.17
[9] BSgenome_1.23.2
[10] Biostrings_2.23.6
[11] GenomicRanges_1.7.25
[12] IRanges_1.13.24
[13] BiocGenerics_0.1.4
[14] Gviz_0.99.0
[15] BiocInstaller_1.3.7
loaded via a namespace (and not attached):
[1] biomaRt_2.11.1 bitops_1.0-4.1 DBI_0.2-5
lattice_0.20-0
[5] Matrix_1.0-3 RColorBrewer_1.0-5 RCurl_1.91-1
RSQLite_0.11.1
[9] rtracklayer_1.15.7 snpStats_1.5.4 splines_2.15.0
survival_2.36-12
[13] tools_2.15.0 XML_3.9-4 zlibbioc_1.1.1
--
Alex Gutteridge
More information about the Bioconductor
mailing list