[R] OK - I got the data - now what? :-)

Michael A. Miller mmiller3 at iupui.edu
Wed Jul 8 19:51:36 CEST 2009


>>>>> Mark wrote:

    > Currently my data is one experiment per row, but that's
    > wasting space as most experiments only take 20% of the row
    > and 80% of the row is filled with 0's. I might want to make
    > the array more narrow and have a flag somewhere in the 1st
    > 10 columns that says the this row is a continuation row
    > from the previous row. That way I could pack the array
    > better, use less memory and when I do finally test for 0 I
    > have a short line to traverse?

This may be a bit off track from the data manipulation you are
working on, but I thought I'd point out that another way to
handle this sort of data is to make a table with one measurement
per row, rather than one experiment per row.

experiment measurement value
         A           1  0.27
         A           2  0.66
         A           3  0.24
         A           4  0.55
         B           1  0.13
         B           2  0.65
         B           3  0.83
         B           4  0.41
         B           5  0.92
         B           6  0.67
         C           1  0.75
         C           2  0.97
         C           3  0.49
         C           4  0.58
         D           1  1.00
         D           2  0.71
         E           1  0.11
         E           2  0.50
         E           3  0.98
         E           4  0.07
         E           5  0.94
         E           6  0.57
         E           7  0.34
         E           8  0.21


If you wrote the output of your calculations in this way, one
value per line, it can easily be read into R as a data.frame and
handled with less need for munging.  No need to remove the
zero-padding because the zeros aren't needed in the first place.

You can subset the data with subset, as in

  test <- read.table('test.dat',header=TRUE)
  expA <- subset(test, experiment=='A')
  expB <- subset(test, experiment=='B')

so there is no need to deal with ragged/zero-padded arrays. Your
plots can be grouped automatically with lattice: 

require(lattice)
xyplot(value ~ measurement, data=test, group=experiment, type='b')
xyplot(value ~ measurement | experiment, data=test, type='b')


It is simple to do calculations by experiment using tapply.  For
example


> with(test, tapply(value, experiment, mean))
        A         B         C         D         E 
0.4300000 0.6016667 0.6975000 0.8550000 0.4650000 
 

> with(test, tapply(measurement, experiment, max))
A B C D E 
4 6 4 2 8 



Mike




More information about the R-help mailing list