[Bioc-devel] ShortRead readFasta UniProt Incorrect Import
Martin Morgan
martin.morgan at roswellpark.org
Wed Oct 18 17:03:17 CEST 2017
On 10/18/2017 01:00 AM, Dario Strbenac wrote:
> Good day,
>
> If I have a FASTA file that contains
>
>> sp|Q9NYW0|T2R10_HUMAN Taste receptor type 2 member 10 OS=Homo sapiens GN=TAS2R10 PE=1 SV=3
> MLRVVEGIFIFVVVSESVFGVLGNGFIGLVNCIDCAKNKLSTIGFILTGLAISRIFLIWI
> IITDGFIQIFSPNIYASGNLIEYISYFWVIGNQSSMWFATSLSIFYFLKIANFSNYIFLW
> LKSRTNMVLPFMIVFLLISSLLNFAYIAKILNDYKTKNDTVWDLNMYKSEYFIKQILLNL
> GVIFFFTLSLITCIFLIISLWRHNRQMQSNVTGLRDSNTEAHVKAMKVLISFIILFILYF
> IGMAIEISCFTVRENKLLLMFGMTTTAIYPWGHSFILILGNSKLKQASLRVLQQLKCCEK
> RKNLRVT
>
> readFasta fails to import it with the warning
>
> proteins <- readFasta('.', "test.fasta")
>
> Warning message:
> In .Call2("fasta_index", filexp_list, nrec, skip, seek.first.rec, :
> reading FASTA file test.fasta: ignored 129 invalid one-letter sequence codes
>
> Also, the amino acid sequence is incomplete. There are 308 amino acids, but
>
>> width(proteins)
> [1] 178
>
> It's undesirable for users that some amino acids are discarded. Hopefully, they notice the warning message before proceeding with the analysis.
>
> Admittedly, readFasta is in ShortRead, so is designed to work with high througput sequencing reads. But, perhaps it would be better suited to a infrastructure package such as Biobase and generalised to correctly import any FASTA file. There's even a Bioconductor workflow at https://www.bioconductor.org/help/workflows/sequencing/ which has a section titled "DNA/amino acid sequence from FASTA files" and demonstrates the use of readFasta.
See Biostrings::readAAStringSet (and friends).
>
> I used version 1.34.2 of ShortRead which is the newest one.
>
> --------------------------------------
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
This email message may contain legally privileged and/or...{{dropped:2}}
More information about the Bioc-devel
mailing list