[Bioc-devel] ShortRead readFasta UniProt Incorrect Import
dstr7320 at uni.sydney.edu.au
Wed Oct 18 07:00:03 CEST 2017
If I have a FASTA file that contains
>sp|Q9NYW0|T2R10_HUMAN Taste receptor type 2 member 10 OS=Homo sapiens GN=TAS2R10 PE=1 SV=3
readFasta fails to import it with the warning
proteins <- readFasta('.', "test.fasta")
In .Call2("fasta_index", filexp_list, nrec, skip, seek.first.rec, :
reading FASTA file test.fasta: ignored 129 invalid one-letter sequence codes
Also, the amino acid sequence is incomplete. There are 308 amino acids, but
It's undesirable for users that some amino acids are discarded. Hopefully, they notice the warning message before proceeding with the analysis.
Admittedly, readFasta is in ShortRead, so is designed to work with high througput sequencing reads. But, perhaps it would be better suited to a infrastructure package such as Biobase and generalised to correctly import any FASTA file. There's even a Bioconductor workflow at https://www.bioconductor.org/help/workflows/sequencing/ which has a section titled "DNA/amino acid sequence from FASTA files" and demonstrates the use of readFasta.
I used version 1.34.2 of ShortRead which is the newest one.
University of Sydney
Camperdown NSW 2050
More information about the Bioc-devel