[R] Manage huge database

jim holtman jholtman at gmail.com
Mon Sep 22 17:41:53 CEST 2008


Why don't you make one pass through your data and encode you
characters as integers (it would appear that you only have 16
combinations).  You might also want to consider using the 'raw' object
since these only take up one byte of storage -- will reduce your
storage requirements by 4.  Then store each row in a 'filehash' object
so you can quickly retrieve a row at a time and then index directly to
the byte(s) that have the information that you want.

On Mon, Sep 22, 2008 at 7:00 AM, José E. Lozano <lozalojo at jcyl.es> wrote:
>> So is each line just ACCGTATAT etc etc?
>
> Exacty, A_G, A_A, G_G and the such.
>
>> If you have fixed width fields in a file, so that every line is the
>> same length, then you can use random access methods to get to a
>> particular value - just multiply the line length by the row number you
>
> Nice hint! I didn't think on this. But I fear that if I have missing values
> on the file I wont be able to read the right information...
>
>> When doing this, it's a good idea to test your dataset first to make
>> sure the lines and fields are right.
>
> Yes, I am trying to figure out if all the lines have the exact same lenght
> to use a random access method to read it.
>
> Thanks,
> Jose Lozano
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list