[R] OK - I got the data - now what? :-)
Michael A. Miller
mmiller3 at iupui.edu
Wed Jul 8 19:51:36 CEST 2009
>>>>> Mark wrote:
> Currently my data is one experiment per row, but that's
> wasting space as most experiments only take 20% of the row
> and 80% of the row is filled with 0's. I might want to make
> the array more narrow and have a flag somewhere in the 1st
> 10 columns that says the this row is a continuation row
> from the previous row. That way I could pack the array
> better, use less memory and when I do finally test for 0 I
> have a short line to traverse?
This may be a bit off track from the data manipulation you are
working on, but I thought I'd point out that another way to
handle this sort of data is to make a table with one measurement
per row, rather than one experiment per row.
experiment measurement value
A 1 0.27
A 2 0.66
A 3 0.24
A 4 0.55
B 1 0.13
B 2 0.65
B 3 0.83
B 4 0.41
B 5 0.92
B 6 0.67
C 1 0.75
C 2 0.97
C 3 0.49
C 4 0.58
D 1 1.00
D 2 0.71
E 1 0.11
E 2 0.50
E 3 0.98
E 4 0.07
E 5 0.94
E 6 0.57
E 7 0.34
E 8 0.21
If you wrote the output of your calculations in this way, one
value per line, it can easily be read into R as a data.frame and
handled with less need for munging. No need to remove the
zero-padding because the zeros aren't needed in the first place.
You can subset the data with subset, as in
test <- read.table('test.dat',header=TRUE)
expA <- subset(test, experiment=='A')
expB <- subset(test, experiment=='B')
so there is no need to deal with ragged/zero-padded arrays. Your
plots can be grouped automatically with lattice:
require(lattice)
xyplot(value ~ measurement, data=test, group=experiment, type='b')
xyplot(value ~ measurement | experiment, data=test, type='b')
It is simple to do calculations by experiment using tapply. For
example
> with(test, tapply(value, experiment, mean))
A B C D E
0.4300000 0.6016667 0.6975000 0.8550000 0.4650000
> with(test, tapply(measurement, experiment, max))
A B C D E
4 6 4 2 8
Mike
More information about the R-help
mailing list