[R] aggregate.data.frame(drop=FALSE) in R 3.3.0

Sat May 14 08:38:54 CEST 2016

>From NEWS: The data frame and formula methods for aggregate() gain a drop argument.

Here, I highlight behavior of 'aggregate.data.frame' with drop=FALSE in R 3.3.0.

Example 1, modified from "example with character variables and NAs" in "Example" in R help on 'aggregate':
> testDF <- data.frame(v1 = c(1,3,5,7,8,3,5,NA,4,5,7,9),
+                      v2 = c(11,33,55,77,88,33,55,NA,44,55,77,99) )
> by1 <- c("red", "blue", 1, 2, NA, "big", 1, 2, "red", 1, NA, 12)
> by2 <- c("wet", "dry", 99, 95, NA, "damp", 95, 99, "red", 99, NA, NA)
> str(aggregate(x = testDF, by = list(by1, by2), FUN = "mean", drop = FALSE))
'data.frame':   30 obs. of  4 variables:
 $ Group.1: Factor w/ 5 levels "1","2","big",..: 1 2 3 4 5 1 2 3 4 5 ...
 $ Group.2: Factor w/ 6 levels "95","99","damp",..: 1 1 1 1 1 2 2 2 2 2 ...
 $ v1     : num  5 7 NaN NaN NaN 5 NA NaN NaN NaN ...
 $ v2     : num  55 77 NaN NaN NaN 55 NA NaN NaN NaN ...
 - attr(*, "out.attrs")=List of 2
  ..$ dim     : Named int  5 6
  .. ..- attr(*, "names")= chr  "Group.1" "Group.2"
  ..$ dimnames:List of 2
  .. ..$ Group.1: chr  "Group.1=1" "Group.1=2" "Group.1=big" "Group.1=blue" ...
  .. ..$ Group.2: chr  "Group.2=95" "Group.2=99" "Group.2=damp" "Group.2=dry" ..
.
> str(aggregate(x = testDF, by = list(by1, by2), FUN = "mean"))
'data.frame':   8 obs. of  4 variables:
 $ Group.1: chr  "1" "2" "1" "2" ...
 $ Group.2: chr  "95" "95" "99" "99" ...
 $ v1     : num  5 7 5 NA 3 3 4 1
 $ v2     : num  55 77 55 NA 33 33 44 11

The result of 'aggregate.data.frame' with drop=FALSE has attribute "out.attrs"; the result of default 'aggregate.data.frame' (drop=TRUE) doesn't.
Character grouping variable becomes a factor in the result of 'aggregate.data.frame' with drop=FALSE; stays as character in the result of default 'aggregate.data.frame' (drop=TRUE).

Example 2, modified from "Compute the averages according to region and the occurrence of more than 130 days of frost" in "Examples" in R help on 'aggregate':
> aggregate(state.x77,
+           list(Region = state.region,
+                Cold = state.x77[,"Frost"] > 130),
+           mean, drop = FALSE)
         Region  Cold Population   Income Illiteracy Life Exp    Murder
1     Northeast FALSE  8802.8000 4780.400  1.1800000 71.12800  5.580000
2         South FALSE  4208.1250 4011.938  1.7375000 69.70625 10.581250
3 North Central FALSE  7233.8333 4633.333  0.7833333 70.95667  8.283333
4          West FALSE  4582.5714 4550.143  1.2571429 71.70000  6.828571
5     Northeast  TRUE  1360.5000 4307.500  0.7750000 71.43500  3.650000
6         South  TRUE        NaN      NaN        NaN      NaN       NaN
7 North Central  TRUE  2372.1667 4588.833  0.6166667 72.57667  2.266667
8          West  TRUE   970.1667 4880.500  0.7500000 70.69167  7.666667
   HS Grad    Frost      Area
1 52.06000 110.6000  21838.60
2 44.34375  64.6250  54605.12
3 53.36667 120.0000  56736.50
4 60.11429  51.0000  91863.71
5 56.35000 160.5000  13519.00
6      NaN      NaN       NaN
7 55.66667 157.6667  68567.50
8 64.20000 161.8333 184162.17
> aggregate(state.x77,
+           list(Region = state.region,
+                Cold = state.x77[,"Frost"] > 130),
+           mean)
         Region  Cold Population   Income Illiteracy Life Exp    Murder
1     Northeast FALSE  8802.8000 4780.400  1.1800000 71.12800  5.580000
2         South FALSE  4208.1250 4011.938  1.7375000 69.70625 10.581250
3 North Central FALSE  7233.8333 4633.333  0.7833333 70.95667  8.283333
4          West FALSE  4582.5714 4550.143  1.2571429 71.70000  6.828571
5     Northeast  TRUE  1360.5000 4307.500  0.7750000 71.43500  3.650000
6 North Central  TRUE  2372.1667 4588.833  0.6166667 72.57667  2.266667
7          West  TRUE   970.1667 4880.500  0.7500000 70.69167  7.666667
   HS Grad    Frost      Area
1 52.06000 110.6000  21838.60
2 44.34375  64.6250  54605.12
3 53.36667 120.0000  56736.50
4 60.11429  51.0000  91863.71
5 56.35000 160.5000  13519.00
6 55.66667 157.6667  68567.50
7 64.20000 161.8333 184162.17

Unlike 'tapply', in 'aggregate.data.frame' with drop=FALSE, the function (mean in example 2 above) is also applied to subset corresponding to combination of grouping variables that doesn't appear in the data.

Example 3, modified from http://stackoverflow.com/questions/22523131/dplyr-summarise-equivalent-of-drop-false-to-keep-groups-with-zero-length-in :
> DF <- data.frame(a=rep(1:3,4), b=factor(rep(1:2,6), levels=1:3))
> aggregate(DF["a"], DF["b"], length, drop=FALSE)
  b a
1 1 6
2 2 6

Unlike 'interaction' with drop=FALSE, or 'tapply', for factor grouping variable, levels that never appear in the data (in example 3 above, "3" in 'b') don't appear in the result of 'aggregate.data.frame' with drop=FALSE.