[R] aggregate

Marc Schwartz marc_schwartz at comcast.net
Sat Aug 9 00:34:45 CEST 2008


on 08/08/2008 04:14 PM Sherri Heck wrote:
> Dear All-
> 
> I have a dataset that is comprised of the following:
> 
> 
> doy yr mon day hr hgt1 hgt2 hgt3 co21 co22 co23 sig1 sig2 sig3 dif flag
> 
> 244.02083 2005 09 01 00 2.6 9.5 17.8 375.665 373.737 373.227 3.698 1.107 
> 0.963 -0.509 PRE
> 244.0625 2005 09 01 01 2.6 9.5 17.8 393.66 384.773 379.466 15.336 11.033 
> 5.76 -5.307 PRE
> 244.10417 2005 09 01 02 2.6 9.5 17.8 411.162 397.866 387.755 6.835 5.61 
> 6.728 -10.112 PRE
> 244.14583 2005 09 01 03 2.6 9.5 17.8 417.78 395.579 392.838 13.897 4.631 
> 4.998 -2.741 PRE
> 244.1875 2005 09 01 04 2.6 9.5 17.8 420.844 408.747 401.923 5.673 7.154 
> 4.312 -6.823 PRE
> 244.22917 2005 09 01 05 2.6 9.5 17.8 414.714 404.652 401.141 3.72 3.296 
> 4.028 -3.511 PRE
> 244.27083 2005 09 01 06 2.6 9.5 17.8 423.77 405.623 401.815 5.23 2.808 
> 2.653 -3.808 PRE
> 244.3125 2005 09 01 07 2.6 9.5 17.8 415.703 406.324 399.115 4.189 3.441 
> 3.648 -7.209 PRE
> 244.35417 2005 09 01 08 2.6 9.5 17.8 416.343 403.47 398.71 4.344 2.145 
> 1.482 -4.76 PRE
> 244.39583 2005 09 01 09 2.6 9.5 17.8 413.512 401.195 398.106 3.941 2.663 
> 1.905 -3.089 PRE
> 244.4375 2005 09 01 10 2.6 9.5 17.8 419.308 406.377 401.889 3.443 2.888 
> 1.471 -4.488 PRE
> 244.47917 2005 09 01 11 2.6 9.5 17.8 415.864 406.004 400.702 2.082 1.858 
> 2.507 -5.302 PRE
> 244.52083 2005 09 01 12 2.6 9.5 17.8 413.062 405.385 400.509 4.142 2.041 
> 2.52 -4.877 PRE
> 244.5625 2005 09 01 13 2.6 9.5 17.8 407.566 396.001 392.831 7.445 5.028 
> 3.691 -3.17 PRE
> 244.60417 2005 09 01 14 2.6 9.5 17.8 395.374 385.151 384.275 8.889 2.422 
> 2.115 -0.876 PRE
> 244.64583 2005 09 01 15 2.6 9.5 17.8 383.308 382.984 383.43 1.546 1.168 
> 1.529 0.446 PRE
> 244.6875 2005 09 01 16 2.6 9.5 17.8 379.112 379.168 379.209 1.475 1.478 
> 1.365 0.041 PRE
> 244.72917 2005 09 01 17 2.6 9.5 17.8 372.948 373.424 373.737 1.52 1.721 
> 1.827 0.313 PRE
> 244.77083 2005 09 01 18 2.6 9.5 17.8 373.435 373.479 373.498 0.431 0.29 
> 0.187 0.019 PRE
> 244.8125 2005 09 01 19 2.6 9.5 17.8 374.082 374.11 374.104 0.264 0.227 
> 0.242 -0.006 PRE
> 244.85417 2005 09 01 20 2.6 9.5 17.8 373.619 373.592 373.528 0.221 0.253 
> 0.225 -0.064 PRE
> 244.89583 2005 09 01 21 2.6 9.5 17.8 373.955 373.914 373.892 0.199 0.272 
> 0.275 -0.023 PRE
> 244.9375 2005 09 01 22 2.6 9.5 17.8 374.734 374.699 374.597 0.512 0.305 
> 0.42 -0.102 PRE
> 244.97917 2005 09 01 23 2.6 9.5 17.8 375.889 376.624 375.095 0.968 3.023 
> 0.257 -1.529 PRE
> 245.02083 2005 09 02 00 2.6 9.5 17.8 383.539 377.323 376.664 3.434 0.363 
> 0.535 -0.658 PRE
> 245.0625 2005 09 02 01 2.6 9.5 17.8 404.529 396.357 391.17 4.293 8.426 
> 9.43 -5.187 PRE
> 245.10417 2005 09 02 02 2.6 9.5 17.8 419.268 409.692 400.793 5.609 2.012 
> 1.856 -8.899 PRE
> 245.14583 2005 09 02 03 2.6 9.5 17.8 418.503 403.053 396.932 4.163 2.639 
> 2.164 -6.121 PRE
> 245.1875 2005 09 02 04 2.6 9.5 17.8 416.242 403.211 398.815 4.174 4.169 
> 2.325 -4.396 PRE
> 245.22917 2005 09 02 05 2.6 9.5 17.8 418.22 408.339 400.995 3.671 2.471 
> 2.831 -7.343 PRE
> 245.27083 2005 09 02 06 2.6 9.5 17.8 420.275 406.998 398.091 2.467 4.558 
> 3.799 -8.908 PRE
> 245.3125 2005 09 02 07 2.6 9.5 17.8 417.073 407.904 402.422 2.717 2.132 
> 2.391 -5.482 PRE
> 245.35417 2005 09 02 08 2.6 9.5 17.8 414.63 403.795 400.48 4.526 3.064 
> 2.107 -3.315 PRE
> 245.39583 2005 09 02 09 2.6 9.5 17.8 416.399 404.625 397.153 6.47 4.256 
> 3.132 -7.472 PRE
> 245.4375 2005 09 02 10 2.6 9.5 17.8 417.079 408.685 403.236 5.573 2.807 
> 4.058 -5.449 PRE
> 245.47917 2005 09 02 11 2.6 9.5 17.8 420.808 412.187 406.455 1.606 3.447 
> 2.37 -5.732 PRE
> 245.52083 2005 09 02 12 2.6 9.5 17.8 420.05 410.59 404.307 3.128 5.356 
> 2.849 -6.283 PRE
> 245.5625 2005 09 02 13 2.6 9.5 17.8 412.6 401.255 395.029 7.586 6.056 
> 3.897 -6.225 PRE
> 245.60417 2005 09 02 14 2.6 9.5 17.8 409.137 392.559 388.098 14.58 11.55 
> 6.646 -4.461 PRE
> 245.64583 2005 09 02 15 2.6 9.5 17.8 378.641 377.914 378.87 1.636 0.679 
> 1.116 0.955 PRE
> 245.6875 2005 09 02 16 2.6 9.5 17.8 378.776 378.891 378.697 1.153 1.185 
> 1.39 -0.195 PRE
> 245.72917 2005 09 02 17 2.6 9.5 17.8 374.322 374.281 374.45 0.507 0.507 
> 0.408 0.169 PRE
> 245.77083 2005 09 02 18 2.6 9.5 17.8 374.084 374.086 374.069 0.132 0.229 
> 0.147 -0.017 PRE
> 245.8125 2005 09 02 19 2.6 9.5 17.8 374.113 374.103 374.295 0.33 0.229 
> 0.181 0.192 PRE
> 245.85417 2005 09 02 20 2.6 9.5 17.8 374.468 374.299 374.264 0.315 0.225 
> 0.209 -0.035 PRE
> 245.89583 2005 09 02 21 2.6 9.5 17.8 374.389 374.258 374.284 0.2 0.213 
> 0.206 0.026 PRE
> 245.9375 2005 09 02 22 2.6 9.5 17.8 374.726 374.577 374.524 0.404 0.271 
> 0.166 -0.054 PRE
> 245.97917 2005 09 02 23 2.6 9.5 17.8 374.829 374.946 374.563 0.543 1.107 
> 0.429 -0.383 PRE
> 
> I would like to obtain the hourly (named "hr" above) average of co23 - e.g.
> 
> for all years (the dataset actually extends from 2005 - present), all 
> months and all days - i would like to get an average of the diurnal cycle.
> preferably the result would look something akin to this:
> 
> hour   co23(avg)
> 0        395.345 (e.g.)
> 1        and so on....
> 2      3
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> ..
> ..
> 23
> 
> 
> I have tried for loops, but I can't seem to get that to work and for the 
> past 1.5 days I have been trying to use "aggregate" but i can only seem 
> to get an average of all variables (across the rows).
> 
> Any suggestions would be very much appreciated!
> 
> s.heck

Try this:

# The first argument is the value (or values) that we want to get
# the mean for.  The second argument, which must be a list, is
# the 'grouping' column[s]. Finally, we indicate that we want the
# mean value returned and also filter any NA's that are present

 > aggregate(DF$co23, list(hour = DF$hr), mean, na.rm = TRUE)
    hour        x
1     0 374.9455
2     1 385.3180
3     2 394.2740
4     3 394.8850
5     4 400.3690
6     5 401.0680
7     6 399.9530
8     7 400.7685
9     8 399.5950
10    9 397.6295
11   10 402.5625
12   11 403.5785
13   12 402.4080
14   13 393.9300
15   14 386.1865
16   15 381.1500
17   16 378.9530
18   17 374.0935
19   18 373.7835
20   19 374.1995
21   20 373.8960
22   21 374.0880
23   22 374.5605
24   23 374.8290


Just as a further example, if you wanted the results broken down by year 
and by hour, you would do something like this:

 > aggregate(DF$co23, list(year = DF$yr, hour = DF$hr),
             mean, na.rm = TRUE)
    year hour        x
1  2005    0 374.9455
2  2005    1 385.3180
3  2005    2 394.2740
4  2005    3 394.8850
5  2005    4 400.3690
6  2005    5 401.0680
7  2005    6 399.9530
8  2005    7 400.7685
9  2005    8 399.5950
10 2005    9 397.6295
11 2005   10 402.5625
12 2005   11 403.5785
13 2005   12 402.4080
14 2005   13 393.9300
15 2005   14 386.1865
16 2005   15 381.1500
17 2005   16 378.9530
18 2005   17 374.0935
19 2005   18 373.7835
20 2005   19 374.1995
21 2005   20 373.8960
22 2005   21 374.0880
23 2005   22 374.5605
24 2005   23 374.8290


Of course with your larger dataset, there would be the additional years.


HTH,

Marc Schwartz



More information about the R-help mailing list