[Bioc-sig-seq] limit to character length for read.DNAStringSet()

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Sep 22 17:05:30 CEST 2010


Hi,

On Wed, Sep 22, 2010 at 10:51 AM, Andrew Yee <yee at post.harvard.edu> wrote:
> Is there a limit to the number of characters in a line for read.DNAStringSet()?
<snip>
>> bar <- read.DNAStringSet(filepath='~/sandbox/foo.fasta', format='fasta')
> Error in .read.fasta.in.XStringSet(filepath, set.names, elementType, lkup) :
>  reading FASTA file     : cannot read line 2, line is too long

Apparently so :-)

Assuming your on a *nix-type machine, you can use the `fold` command
(from the terminal) to pretty easily fix your problem ... you would
have to assume a maxlength for your header(?) lines (the ones in your
fasta file that start with ">"). Since you've already shown that the
read.DNAStringSet function can handle line lengths of 2000, maybe you
can use that (or some smaller number), if you like.

>From terminal:

$ fold -w 2000 foo.fasta > foo.folded.fasta

Then fire up R and do you reading as usual.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioc-sig-sequencing mailing list