[BioC] fasta sequence is too long to be read
Steve Lianoglou
mailinglist.honeypot at gmail.com
Tue Nov 29 19:06:38 CET 2011
Hi,
On Tue, Nov 29, 2011 at 12:32 PM, wang peter <wng.peter at gmail.com> wrote:
> hello, all
>
> i met this problem
>
> rm(list=ls())
> library(ShortRead);
> fastafile="unigenes.fasta"
> seqs <- readFasta(fastafile);
>
>
> Error in .read.fasta.in.XStringSet(efp_list, nrec, skip, use.names,
> elementType, :
> reading FASTA file unigenes.fasta: cannot read line 474, line is too long
How long is it?
You can always try opening the file in your favorite editor and
introducing a carriage return there to split the sequence into two
lines, perhaps.
I suspect you can use the *nix `fold` command line utility to ensure
that all your lines are less than, say 100 chars long, eg from the
command line:
$ fold -w 100 unigenes.fasta > unigenes.fold.fasta
Just make sure that none of your description lines in the fasta file
(the ones that start with ">whatever") aren't longer than whatever you
set `-w` to be.
HTH,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list