[BioC] DNAStringSet_translate error in predictCoding()

"Dr. Jörg Linde" joerg.linde at hki-jena.de
Tue Jun 17 14:45:44 CEST 2014


Dear bioconductor team,

I have a problem with predictCoding() of the VariantAnnotation library 
posing an error which is the same as described here:
https://stat.ethz.ch/pipermail/bioconductor/2012-November/048940.html

Howerver, after reading my vcf it clearly has  a DNAStringSetList in 
it's ALT variable.
The problem remains when using vcftools to remove indels from the vcf. 
As far as I see there are some ALTs with two possibilities.
Is there anything else which could cause the problem?

I am also aware of this thread 
https://stat.ethz.ch/pipermail/bioconductor/2012-October/048370.html
but I can't figure out how to remove those lines causing the problem.

Thank you very much
Jörg

  vcf=readVcf("file.vcf","hg")
  coding <- predictCoding(vcf, txdb, seqSource=fa)
Error in .Call2("DNAStringSet_translate", x, DNA_BASE_CODES, lkup, 
skipcode,  :
   in 'x[[6655]]': not a base at pos 3
 > alt(vcf)
DNAStringSetList of length 142721
[[1]] C
[[2]] T
[[3]] G
[[4]] G
[[5]] G
[[6]] C
[[7]] C
[[8]] A
[[9]] G
[[10]] C
..
<142711 more elements>
 > sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C



More information about the Bioconductor mailing list