[BioC] Limit on number of sequence files for forging a BSgenome

Blanchette, Marco MAB at stowers.org
Fri Mar 29 20:10:54 CET 2013

I traced back the error "Error: Line longer than buffer size" that I am
getting from forgeBSgenomeDataPkg() to a call to read.dcf() made in the
forgeBSgenomeDataPkg() that is used to read the seed file. I came to the
realization that there is an upper limit to the number of character
allowed per single line for the DCF files.

For instance:

This works
f");t <- read.dcf("test.dcf")

While this breaks with the same error I get from forgeBSgenomeDataPkg()
f");t <- read.dcf("test.dcf")

Since the seqnames: field I creates in my seed file contains several
thousands entries, I am busting that upper limit. I can reproduce the
error just by trying to read the seed file with read.dcf("mySeedFile.txt")

At this point, I am not sure if there is an easy workaround and whether
this should be consider a bug in BSgenome or read.dcf() that should be


--  Marco Blanchette, Ph.D.
Stowers Institute for Medical Research
1000 East 50th Street
Kansas City MO 64110

Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018

On 3/28/13 11:58 AM, "Blanchette, Marco" <MAB at stowers.org> wrote:

>I see your line of thought, is there a particular fasta file causing
>forgeBSgenomeDataPkg() to break?
>The answer is no. Once I reach a certain number of fasta files, adding one
>more contig breaks the function. For instance, taking the first 454
>contigs of C. brenneri breaks while removing the last or the first fasta
>file from the list (keeping only 453) compile without a problem (neither
>the last or the first fasta files are responsible for breaking the
>function, the number of file is the trigger)
>What's even more puzzling is that the number that breaks is not a fixed
>number. Selecting a random selection of contigs or changing genome will
>change the number that triggers the function to break... However it's
>always around 440 files, which might be due to the size of the fasta files
>being all of very similar sizes.
>Any clues? 
>--  Marco Blanchette, Ph.D.
>Stowers Institute for Medical Research
>1000 East 50th Street
>Kansas City MO 64110
>Tel: 816-926-4071
>Cell: 816-726-8419
>Fax: 816-926-2018
>On 3/27/13 8:22 PM, "Kasper Daniel Hansen" <kasperdanielhansen at gmail.com>
>>You are probably right in diagnosing the problem, but sometimes I
>>think I have seen FASTA files with the entire sequence on a single
>>line, instead of (say) 80 nucleotides and then a newline.  I could
>>believe that a really long contig on a single line without a newline,
>>could cause an error like this. You could quickly check if there is a
>>suspicious file by
>>  wc -l *
>>and look for files with #lines like 2-3.  Somehow 460 seems a weird
>>number to fail at.
>>This may not be your problem, and I am sure Herve will respond in due
>>On Wed, Mar 27, 2013 at 4:28 PM, Blanchette, Marco <MAB at stowers.org>
>>> Hi,
>>> Is there a maximum number of sequence files (chromosomes or contigs in
>>>my case) that can be fed to the forgeBSgenomeDataPkg() function? I am
>>>trying to build a BSgenome for C. brenneri and C. japonica available
>>>from EnsemblGenomes. These genomes are made from thousands of contigs
>>>with genes annotated to them. Currently, I get the following error when
>>>running "Error: Line longer than buffer size" when running on the full
>>>set of contigs. However, it works fine on a seed file containing a
>>>subset of the contigs (I can forge a genome with 450 contigs but not
>>>with 460!)
>>> Any suggestions will be appreciated (I can provide a toy example but I
>>>am not sure what would be the merit of it at this point)
>>> Thanks
>>> --  Marco Blanchette, Ph.D.
>>> Stowers Institute for Medical Research
>>> 1000 East 50th Street
>>> Kansas City MO 64110
>>> www.stowers.org
>>> Tel: 816-926-4071
>>> Cell: 816-726-8419
>>> Fax: 816-926-2018
>>>         [[alternative HTML version deleted]]
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>Bioconductor mailing list
>Bioconductor at r-project.org
>Search the archives:

More information about the Bioconductor mailing list