[BioC] Limit on number of sequence files for forging a BSgenome

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Thu Mar 28 02:22:56 CET 2013


You are probably right in diagnosing the problem, but sometimes I
think I have seen FASTA files with the entire sequence on a single
line, instead of (say) 80 nucleotides and then a newline.  I could
believe that a really long contig on a single line without a newline,
could cause an error like this. You could quickly check if there is a
suspicious file by
  wc -l *
and look for files with #lines like 2-3.  Somehow 460 seems a weird
number to fail at.

This may not be your problem, and I am sure Herve will respond in due time.


On Wed, Mar 27, 2013 at 4:28 PM, Blanchette, Marco <MAB at stowers.org> wrote:
> Hi,
> Is there a maximum number of sequence files (chromosomes or contigs in my case) that can be fed to the forgeBSgenomeDataPkg() function? I am trying to build a BSgenome for C. brenneri and C. japonica available from EnsemblGenomes. These genomes are made from thousands of contigs with genes annotated to them. Currently, I get the following error when running "Error: Line longer than buffer size" when running on the full set of contigs. However, it works fine on a seed file containing a subset of the contigs (I can forge a genome with 450 contigs but not with 460!)
> Any suggestions will be appreciated (I can provide a toy example but I am not sure what would be the merit of it at this point)
> Thanks
> --  Marco Blanchette, Ph.D.
> Stowers Institute for Medical Research
> 1000 East 50th Street
> Kansas City MO 64110
> www.stowers.org
> Tel: 816-926-4071
> Cell: 816-726-8419
> Fax: 816-926-2018
>         [[alternative HTML version deleted]]
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

More information about the Bioconductor mailing list