[Bioc-devel] ShortRead readFasta UniProt Incorrect Import

Martin Morgan martin.morgan at roswellpark.org
Wed Oct 18 17:03:17 CEST 2017


On 10/18/2017 01:00 AM, Dario Strbenac wrote:
> Good day,
> 
> If I have a FASTA file that contains
> 
>> sp|Q9NYW0|T2R10_HUMAN Taste receptor type 2 member 10 OS=Homo sapiens GN=TAS2R10 PE=1 SV=3
> MLRVVEGIFIFVVVSESVFGVLGNGFIGLVNCIDCAKNKLSTIGFILTGLAISRIFLIWI
> IITDGFIQIFSPNIYASGNLIEYISYFWVIGNQSSMWFATSLSIFYFLKIANFSNYIFLW
> LKSRTNMVLPFMIVFLLISSLLNFAYIAKILNDYKTKNDTVWDLNMYKSEYFIKQILLNL
> GVIFFFTLSLITCIFLIISLWRHNRQMQSNVTGLRDSNTEAHVKAMKVLISFIILFILYF
> IGMAIEISCFTVRENKLLLMFGMTTTAIYPWGHSFILILGNSKLKQASLRVLQQLKCCEK
> RKNLRVT
> 
> readFasta fails to import it with the warning
> 
> proteins <- readFasta('.', "test.fasta")
> 
> Warning message:
> In .Call2("fasta_index", filexp_list, nrec, skip, seek.first.rec,  :
>    reading FASTA file test.fasta: ignored 129 invalid one-letter sequence codes
> 
> Also, the amino acid sequence is incomplete. There are 308 amino acids, but
> 
>> width(proteins)
> [1] 178
> 
> It's undesirable for users that some amino acids are discarded. Hopefully, they notice the warning message before proceeding with the analysis.
> 
> Admittedly, readFasta is in ShortRead, so is designed to work with high througput sequencing reads. But, perhaps it would be better suited to a infrastructure package such as Biobase and generalised to correctly import any FASTA file. There's even a Bioconductor workflow at https://www.bioconductor.org/help/workflows/sequencing/ which has a section titled "DNA/amino acid sequence from FASTA files" and demonstrates the use of readFasta.

See Biostrings::readAAStringSet (and friends).


> 
> I used version 1.34.2 of ShortRead which is the newest one.
> 
> --------------------------------------
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 


This email message may contain legally privileged and/or...{{dropped:2}}



More information about the Bioc-devel mailing list