[R] OK - I got the data - now what? :-)

Don MacQueen macq at llnl.gov
Mon Jul 6 04:54:32 CEST 2009


At 10:42 AM -0700 7/5/09, Mark Knecht wrote:
>2009/7/5 Uwe Ligges <ligges at statistik.tu-dortmund.de>:
>>
>

<- a lot of other conversation omitted, to focus on the following>

>Currently my data is one experiment per row, but that's wasting space
>as most experiments only take 20% of the row and 80% of the row is
>filled with 0's. I might want to make the array more narrow and have a
>flag somewhere in the 1st 10 columns that says the this row is a
>continuation row from the previous row. That way I could pack the
>array better, use less memory and when I do finally test for 0 I have
>a short line to traverse?
>
>Just an idea.
>
>Anyway, I suspect either of these will suit my short term needs. On to
>the next step.
>
>Cheers,
>Mark


This suggests the use of a "list" rather than a data frame. With a 
list object, each element in the list would represent one experiment, 
and each would have the appropriate number of elements (values) for 
that experiment.

Indeed, the original description,

At 5:02 PM -0700 7/4/09, Mark Knecht wrote:
>OK, I guess I'm getting better at the data part of R. I wrote a
>program outside of R this morning to dump a bunch of experimental
>data. It's a sort of ragged array - about 700 rows and 400 columns,
>but the amount of data in each column varies based on the length of
>the experiment. The real data ends with a 0 following some non-zero
>value. It might be as short as 5 to 10 columns or as many as 390. The
>first 9 columns contain some data about when the experiment was run
>and a few other things I thought I might be interested in later. All
>the data starts in column 10 and has headers saying C1, C2, C3, C4,
>etc., up to C390 The first value for every experiment is some value I
>will normalize and then the values following are above and below the
>original tracing out the path that the experiment took, ending
>somewhere to the right but not a fixed number of readings.

Is also suggestive of using a list(). For example, the metadata, 
i.e., the "... data about when the experiment was run and a few other 
things ..." could be held separately, instead of embedded in the same 
array, from which it always has to be excluded in order to do an 
analysis.

But I haven't followed the thread all that closely, so confess that 
my thoughts might be off the mark.

-Don

-- 
---------------------------------
Don MacQueen
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062
macq at llnl.gov




More information about the R-help mailing list