[Bioc-devel] size limit of a string or a BLOB
sdavis2 at mail.nih.gov
Sat Sep 29 16:36:12 CEST 2007
Seth Falcon wrote:
> Hi Simon,
> This might be better directed to the R-sig-db list. But anyway...
> Simon Lin <simonlin at duke.edu> writes:
>> I have got the following error message when using RSQLite. So, what is
>> the size limit of a string? and what is the limit of the BLOB (by
>> defination, I assume it shoud be in the TB range at least)? Thanks! -Simon
>> # trying to insert a DNA sequence of about 500MB into a table
>> > query01<-dbSendQuery(connect, statement = paste ("
>> + insert into dna values ('", seq0, "')", sep="")
>> + )
>> Error in sqliteExecStatement(conn, statement, ...) :
>> RS-DBI driver: (error in statement: String or BLOB exceeded size limit)
> This is a Sqlite error message, that is, not something being generated
> in RSQLite directly. The hard coded limit from the SQLite sources
> is just undr 1GB. Here's the relevant SQLite code:
> ** The maximum length of a TEXT or BLOB in bytes. This also
> ** limits the size of a row in a table or index.
> ** The hard limit is the ability of a 32-bit signed integer
> ** to count the size: 2^31-1 or 2147483647.
> #ifndef SQLITE_MAX_LENGTH
> # define SQLITE_MAX_LENGTH 1000000000
> How many columns does the dna table have? Perhaps there is a
> different way to organize the data?
You can look at how EnsEMBL does this in their DNA tables. Basically,
they break the data into several chunks of, say 10Kb and store that
along with a sequence id and a start and end. A middle layer is then
necessary for doing inserts and selects to either break up the sequence
or piece it back together. The advantage is that fast random access to
regions of large sequences is very fast and memory efficient.
Since you are in R, you could also look as using something like
BioStrings in combination with RSQLite, though I'm not sure of the
details of such a strategy.
More information about the Bioc-devel