[R] Aggregate with non-scalar function

Mike Nielsen mr.blacksheep at gmail.com
Wed Nov 7 17:46:41 CET 2007


R-Helpers,

I'm sorry to have to ask this -- I've not used R very much in the last
8 or 10 months, and I've gotten rusty.

I have the following (ff2 is a subset of a much, much larger dataset):

> ff2
      hostName user sys idle             obsTime
10142     fred  0.4 0.5 98.0 2007-11-01 02:02:18
16886   barney  0.5 0.2 94.6 2007-10-25 19:12:12
8795      fred  0.0 0.1 99.8 2007-10-30 05:08:22
5261      fred  0.1 0.2 99.7 2007-10-25 07:20:32
12427   barney  0.1 0.2 93.2 2007-10-19 14:34:10
18067   barney  0.1 0.2 99.4 2007-10-27 10:34:08
973       fred  0.0 0.2 99.8 2007-10-19 08:24:22
5426      fred  0.2 0.3 99.5 2007-10-25 12:50:33
7067      fred  0.1 0.2 99.4 2007-10-27 19:32:27
13159   barney  0.1 0.4 84.3 2007-10-20 14:58:11
17481   barney  1.2 2.0 92.6 2007-10-26 15:02:11
21632   barney  0.1 0.1 99.6 2007-11-01 09:24:09
206       fred 19.4 4.8 53.7 2007-10-18 06:50:34
18151   barney  0.1 0.2 94.9 2007-10-27 13:22:09
10662     fred  0.1 0.2 99.6 2007-11-01 19:22:27
10376     fred  0.0 0.2 99.7 2007-11-01 09:50:24
3630      fred 43.7 7.0 33.0 2007-10-23 00:58:27
1118      fred  0.6 0.4 98.9 2007-10-19 13:14:23
5122      fred  0.1 0.2 99.6 2007-10-25 02:42:21
22117   barney  0.0 0.2 99.4 2007-11-02 01:34:04

> doit(ff2)
   hostName hour user.mean sys.mean idle.mean user.max sys.max idle.max
1    barney   01      0.00     0.20     99.40      0.0     0.2     99.4
2    barney   09      0.10     0.10     99.60      0.1     0.1     99.6
3    barney   10      0.10     0.20     99.40      0.1     0.2     99.4
4    barney   13      0.10     0.20     94.90      0.1     0.2     94.9
5    barney   14      0.10     0.30     88.75      0.1     0.4     93.2
6    barney   15      1.20     2.00     92.60      1.2     2.0     92.6
7    barney   19      0.50     0.20     94.60      0.5     0.2     94.6
8      fred   00     43.70     7.00     33.00     43.7     7.0     33.0
9      fred   02      0.25     0.35     98.80      0.4     0.5     99.6
10     fred   05      0.00     0.10     99.80      0.0     0.1     99.8
11     fred   06     19.40     4.80     53.70     19.4     4.8     53.7
12     fred   07      0.10     0.20     99.70      0.1     0.2     99.7
13     fred   08      0.00     0.20     99.80      0.0     0.2     99.8
14     fred   09      0.00     0.20     99.70      0.0     0.2     99.7
15     fred   12      0.20     0.30     99.50      0.2     0.3     99.5
16     fred   13      0.60     0.40     98.90      0.6     0.4     98.9
17     fred   19      0.10     0.20     99.50      0.1     0.2     99.6
> doit
function(x){
x.mean<-aggregate(x[,c("user","sys","idle")],
                             by=list(hostName=x$hostName,

hour=strftime(as.POSIXlt(x$obsTime),"%H")),
                             mean)

x.max<-aggregate(x[,c("user","sys","idle")],
                           by=list(hostName=x$hostName,

hour=strftime(as.POSIXlt(x$obsTime),"%H")),
                           max)

t1<-merge(x.mean,x.max,by=c("hostName","hour"),suffixes=c(".mean",".max"))
return(t1)
}

The point of the "doit" function is to make a new dataframe in which
the columns are summary statistics of certain columns in the argument.

Is there a function similar to:

magic.function(ff2[,c("user","system","idle")],
      by=list(hostName=ff2$hostName,hour=strftime(as.POSIXlt(ff2$obsTime),"%H")),
      function(x){c(mean.user=mean(x$user),
                        mean.system=mean(x$system),
                        mean.idle=mean(x$idle),
                        max.user=max(x$user),
                        max.system=max(x$system),
                        max.idle=max(x$idle))})

ie. an "aggregate" that can cope with a non-scalar function and "do
what I mean"?  My doit function gets more and more ugly the more
summary statistics I add, and I worry about the "merge" with hundreds
of thousands of rows.

I'm almost sure I've seen a solution to what I know is a simple
problem, but I guess my search skills are as bad as my "R": I've
rummaged around the r-help archives and came up with nothing to show
for it.


Pointers would be gratefully received.

Many thanks.
-- 
Regards,

Mike Nielsen



More information about the R-help mailing list