[R] Manage huge database

Mon Sep 22 18:03:31 CEST 2008

On 22-Sep-08 11:00:30, José E. Lozano wrote:
>> So is each line just ACCGTATAT etc etc?
> 
> Exacty, A_G, A_A, G_G and the such.
> 
>> If you have fixed width fields in a file, so that every line is the
>> same length, then you can use random access methods to get to a
>> particular value - just multiply the line length by the row number you
> 
> Nice hint! I didn’t think on this. But I fear that if I have missing
> values on the file I wont be able to read the right information...
> 
>> When doing this, it's a good idea to test your dataset first to make
>> sure the lines and fields are right.
> 
> Yes, I am trying to figure out if all the lines have the exact same
> lenght to use a random access method to read it.

If you were using Linux, I would suggest a command on the lines of

  cat filename | awk '{print(length($0))}'

which would give you the length of each line. But since you have
around 2000 lines, to simply check whether they all have the same
length (in bytes/characters) you can extend the above to

  cat filename | awk '{print(length($0))}' | sort -u

which will present you with all the different line-lengths. If they
are all the same length you will get one number.

I just tested this on a file with lines exceeding 500,000 characters
in length, and it worked perfectly well even for such long lines.

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 22-Sep-08                                       Time: 17:03:21
------------------------------ XFMail ------------------------------