[Bioc-devel] read.XStringSet with spaces in or at end of sequence
Thomas Girke
thomas.girke at ucr.edu
Tue May 22 19:58:47 CEST 2012
Currently, spaces in sequences are handled inconsistently by the FASTA
read functions in Biostrings. This applies to spaces in or at the end of
sequence strings. Because of this users often think Biostrings cannot
handle their sequence data and give up using it which I find
unfortunate.
For instance, given this sequence stored in "test.fasta":
>123
AATTTAAA GGGG
read.DNAStringSet fails to import this sequence which is the
least desirable outcome.
> read.DNAStringSet("test.fasta")
Error in .Call2("read_fasta_in_XStringSet", efp_list, nrec, skip, use.names, :
key 32 (char ' ') not in lookup table
however, read.AAStringSet imports it but maintains the space
> read.AAStringSet("test.fasta")
A AAStringSet instance of length 1
width seq names
[1] 13 AATTTAAA GGGG 123
Wouldn't it make most sense to remove/ignore spaces during the import?
Thomas
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biostrings_2.24.1 IRanges_1.14.2 BiocGenerics_0.2.0
loaded via a namespace (and not attached):
[1] stats4_2.15.0 tools_2.15.0
More information about the Bioc-devel
mailing list