[R] FW: fairly simple file I/O

Wed Apr 7 21:52:54 CEST 2010

Hi,

On Wed, Apr 7, 2010 at 3:15 PM, Cable, Samuel B Civ USAF AFMC
AFRL/RVBXI <Samuel.Cable at hanscom.af.mil> wrote:
>
> OK, my apologies.  I am sure this is a question that has been answered
> before.  But I have looked all over the web and can't find an answer for
> it.  I promise, wasting your time and bandwidth is my last resort.
> So here goes:
>
> I have an ASCII file formatted like so:
>
> Label 1.1
>
> Time 1
>
> Label 1.2
>
> Array of data from time 1
>
> Label 2.1
>
> Time 2
>
> Label 2.2
>
> Array of data from time 2
>
> Label 3.1
>
> Etc.
>
>
>
> I just want an efficient way of reading this data in so that
>
>
>
> 1)      The "Label" values are ignored.
>
> 2)      The "Time" values go into a single vector.
>
> 3)      The "Array of data" values go into a single array.
>
>
>
> The only thing I have been able to do is "scan" everything in to one
> honking big list and then distribute the data out of this list one index
> at a time.  Surely there is a more elegant way?  Thanks.

I'm not sure what combo of search terms you could have used to get
this direct answer -- I think you just have to break this problem down
into smaller ones, which you could then have smoked out ... for
instance:

1. You can read in a file into a vector of character(s)/strings with `readLines`
2. You can use `grep` over a vector of characters to find the indices
in the vector that have strings that match your grep/regex search.
3. `strsplit` breaks a string into pieces given a delimiter.
4. indexing a vector with negative numbers is really helpful

Anyway, let's start with reading in your data into a vector of characetrs

R> lines <- readLines('/path/to/your/file.txt')

Now `lines[1]` will be the first line of the file.

Moving on: do the "Time", "Label", etc. lines actually start with the
word "Time" and "Label"? If so you can just find them with grep.

You say you don't want the "Label" lines, so you can remove them:

R> label.lines <- grep("Label", lines)
R> clean.lines <- lines[-label.lines]

Now `clean.lines` I guess looks like:

Time 1
Array of data form time 1
Time 2
Array of data from time 2

Now you can use grep again, or just pull out every other index:

R> time.lines <- seq(1, length(clean.lines), by=2)
R> times <- clean.lines[time.lines]
R> datas <- clean.lines[-time.lines]

If "datas" is some comma separated line of data, you can use strsplit
on it do split the pieces by a delimiter (like ","). See ?strsplit for
more info.

Does that help?

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact