[BioC] fasta biostrings bioconductor

Martin Morgan mtmorgan at fhcrc.org
Fri Mar 28 17:56:14 CET 2014


On 03/28/2014 09:43 AM, DNAStringSet Error Biostrings in R [guest] wrote:
>
> I posted this same quandary on Biostars and stack overflow.
>
> I am attempting to import a fasta file of sequences into R using Bioconductor's 'Biostrings' package and the 'DNAStringSet' function but I keep getting the same error:
>
> Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW),  :
> key 112 (char 'p') not in lookup table
>
> My fasta file ("FileName.fa") is comprised of various length sequences, in the following format:
>
>> GeneNameOne
> CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA
>> GeneNameTwo
> CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC
> ...etc
>
> I performed 'grep p FileName.fa' in the Unix terminal, but I received no output.

you could try a divide-and-conquer approach, splitting the file into two and 
read each and choose the half with a problem and continue. Please continue 
reading below...

>
> Does anyone have an idea on what is going on?
>
> Thanks in advance.
>
>   -- output of sessionInfo():
>
> Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW),  :
> key 112 (char 'p') not in lookup table

Rather than repeating the error without context, it is usually helpful to 
cut-and-paste the relevant portions of the session that causes problems, e.g.,

 > library(Biostrings)
 > readLines("FileName.fa", 4)   ## correct file?
[1] "> GeneNameOne"
[2] "CAGACACCCATAGATACAGATAGACAGATAGAGAAGACACCACCACACAATGA"
[3] "> GeneNameTwo"
[4] "CGCGACATGAACCCATGATAGACGATGAGACCCCACACACACC"
 > readDNAStringSet("FileName.fa")
Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), 
: key 112 (char 'p') not in lookup table

The information being asked for here is the output of the command sessionInfo() 
so that basic information about your system is available; here's mine,

 > library(Biostrings)
 > sessionInfo()
R version 3.0.2 Patched (2014-01-02 r64626)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] Biostrings_2.30.1  XVector_0.2.0      IRanges_1.20.6     BiocGenerics_0.8.0

loaded via a namespace (and not attached):
[1] stats4_3.0.2


>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list