[R] Quickly calculating the mean results over a collection of data sets?

Michael R. Head burner at suppressingfire.org
Tue Aug 12 10:47:14 CEST 2008


I have a collection of datasets in separate data frames which have 3
independent test parameters (w, x, y) and one dependent variable (z) ,
together with some additional static test data on each row. What I want
is a data frame which contains the test data, the parameters (w, x, y)
and the mean value of all (z)s in the Z column.

Each datasets has  around 6000 rows and around 7 columns, which doesn't
seem outrageously large, so it seems like this shouldn't too time
consuming, but the way I've been approaching it seems to take way too
long (20 seconds for datasets over 4 runs, longer for my datasets over
10 runs). 

My imperative-coding brain lead me to use for loops, which seems to be
particularly problematic for R performance. My first attempt at this
looked like the following, which takes roughly 60 seconds to complete. I
rewrote it a little, but the code was much longer and effectively
replaces one of the for loops with an lapply(). I could paste the other
code, but it's much longer and less clear about its intent.


#######################
# Start code snippet
#######################
### inputFiles just a list of paths to the test runs
testRuns <- lapply(inputFiles, 
		function(x) {
			read.table(x, header=TRUE)})

### W, X, Y have (small) natural values
w <- unique(testRuns[[1]]$W)
x <- unique(testRuns[[1]]$X)
y <- unique(testRuns[[1]]$Y)

### All runs have the same values for all columns
### with the exception of the Z values, so just
### copy the first test run data
testMeans <- data.frame(testRuns[[1]])
for(w0 in w) {
   for(y0 in y) {
     for (x0 in x) {
       row <- which(testMeans$W == w0 &
                    testMeans$Y == y0 &
                    testMeans$X == x0)
       meanValues <- sapply(testRuns,
                            function(r)
                            {mean( subset(r,
                                          r$W == w0 &
                                          r$Y == y0 &
                                          r$X == x0)$Z )})
       testMeans[row,]$Z = mean(meanValues)
     }
   }
 }
### I will then want to plot certain values over (X, Z),
### so ultimately, I'm going to subset the data further.
### Code which gives me a list of W tables with mean Z values
### works, too.
#######################
# End code snippet
#######################


Thanks,
mike

-- 
Michael R. Head <burner at suppressingfire.org>
http://www.cs.binghamton.edu/~mike/



More information about the R-help mailing list