[Bioc-sig-seq] limit to character length for read.DNAStringSet()

Andrew Yee yee at post.harvard.edu
Wed Sep 22 17:56:32 CEST 2010


Thanks, I didn't know about the fold command.  That's a great
suggestion, and using fold solves the problem.

Thanks,
Andrew

On Wed, Sep 22, 2010 at 11:05 AM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi,
>
> On Wed, Sep 22, 2010 at 10:51 AM, Andrew Yee <yee at post.harvard.edu> wrote:
>> Is there a limit to the number of characters in a line for read.DNAStringSet()?
> <snip>
>>> bar <- read.DNAStringSet(filepath='~/sandbox/foo.fasta', format='fasta')
>> Error in .read.fasta.in.XStringSet(filepath, set.names, elementType, lkup) :
>>  reading FASTA file     : cannot read line 2, line is too long
>
> Apparently so :-)
>
> Assuming your on a *nix-type machine, you can use the `fold` command
> (from the terminal) to pretty easily fix your problem ... you would
> have to assume a maxlength for your header(?) lines (the ones in your
> fasta file that start with ">"). Since you've already shown that the
> read.DNAStringSet function can handle line lengths of 2000, maybe you
> can use that (or some smaller number), if you like.
>
> From terminal:
>
> $ fold -w 2000 foo.fasta > foo.folded.fasta
>
> Then fire up R and do you reading as usual.
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>



More information about the Bioc-sig-sequencing mailing list