[R] Applying by() when groups have different lengths

Mon Sep 17 20:54:28 CEST 2018

   My dataframe has 113K rows split by a factor into 58 separate data.frames,
with a different numbers of rows (see error output below).

   I cannot think of a way of proving a sample of data; if a sample for a MWE
is desired advice on produing one using dput() is needed.

   To summarize each group within this dataframe I'm using by() and getting
an error because of the different number of rows:

> by(rainfall_by_site, rainfall_by_site[, 'name'], function(x) {
+ mean.rain <- mean(rainfall_by_site[, 'prcp'])
+ })
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
   arguments imply differing number of rows: 4900, 1085, 1894, 2844, 3520,
  647, 239, 3652, 3701, 3063, 176, 4713, 4887, 119, 165, 1221, 3358, 1457,
  4896, 166, 690, 1110, 212, 1727, 227, 236, 1175, 1485, 186, 769, 139, 203,
  2727, 4357, 1035, 1329, 1454, 973, 4536, 208, 350, 125, 3437, 731, 4894,
  2598, 2419, 752, 427, 136, 685, 4849, 914, 171

   My web searches have not found anything relevant; perhaps my search terms
(such as 'R: apply by() with different factor row numbers') can be improved.

   The help pages found using apropos('by') appear the same: ?by,
?by.data.frame, ?by.default and provide no hint on how to work with unequal
rows per factor.

   How can I apply by() on these data.frames?

Rich