[R-pkgs] plyr: version 1.5

Hadley Wickham hadley at rice.edu
Mon Apr 11 14:16:01 CEST 2011

# plyr

plyr is a set of tools for a common set of problems: you need to
__split__ up a big data structure into homogeneous pieces, __apply__ a
function to each piece and then __combine__ all the results back
together. For example, you might want to:

  * fit the same model each patient subsets of a data frame
  * quickly calculate summary statistics for each group
  * perform group-wise transformations like scaling or standardising

It's already possible to do this with base R functions (like split and
the apply family of functions), but plyr makes it all a bit easier

  * totally consistent names, arguments and outputs
  * convenient parallelisation through the foreach package
  * input from and output to data.frames, matrices and lists
  * progress bars to keep track of long running operations
  * built-in error recovery, and informative error messages
  * labels that are maintained across all transformations

Considerable effort has been put into making plyr fast and memory
efficient, and in many cases plyr is as fast as, or faster than, the
built-in equivalents.

A detailed introduction to plyr has been published in JSS: "The
Split-Apply-Combine Strategy for Data Analysis",
http://www.jstatsoft.org/v40/i01/. You can find out more at
http://had.co.nz/plyr/, or track development at
http://github.com/hadley/plyr. You can ask questions about plyr (and
data manipulation in general) on the plyr mailing list. Sign up at

Version 1.5


* new `strip_splits` function removes splitting variables from the data frames
  returned by `ddply`.

* `rename` moved in from reshape, and rewritten.

* new `match_df` function makes it easy to subset a data frame to only contain
  values matching another data frame. Inspired by


* `**ply` now works when passed a list of functions

* `*dply` now correctly names output even when some output combinations are
  missing (NULL) (Thanks to bug report from Karl Ove Hufthammer)

* `*dply` preserves the class of many more object types.

* `a*ply` now correctly works with zero length margins, operating on the
  entire object (Thanks to bug report from Stavros Macrakis)

* `join` now implements joins in a more SQL like way, returning all possible
  matches, not just the first one. It is still a (little) faster than merge.
  The previous behaviour is accessible with `match = "first"`.

* `join` is now more symmetric so that `join(x, y, "left")` is closer to
  `join(y, x, "right")`, modulo column ordering

* `named.quoted` failed when quoted expressions were longer than 50
  characters. (Thanks to bug report from Eric Goldlust)

* `rbind.fill` now correctly maintains POSIXct tzone attributes and preserves
  missing factor levels

* `split_labels` correctly preserves empty factor levels, which means that
  `drop = FALSE` should work in more places. Use `base::droplevels` to remove
  levels that don't occur in the data, and `drop = T` to remove combinations
  of levels that don't occur.

* `vaggregate` now passes `...` to the aggregation function when working out
  the output type (thanks to bug report by Pavan Racherla)

Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University

More information about the R-packages mailing list