[R-pkgs] plyr 1.4

Hadley Wickham hadley at rice.edu
Tue Jan 4 15:14:50 CET 2011

# plyr

plyr is a set of tools for a common set of problems: you need to
__split__ up a big data structure into homogeneous pieces, __apply__ a
function to each piece and then __combine__ all the results back
together. For example, you might want to:

  * fit the same model each patient subsets of a data frame
  * quickly calculate summary statistics for each group
  * perform group-wise transformations like scaling or standardising

It's already possible to do this with base R functions (like split and
the apply family of functions), but plyr makes it all a bit easier

  * totally consistent names, arguments and outputs
  * convenient parallelisation through the foreach package
  * input from and output to data.frames, matrices and lists
  * progress bars to keep track of long running operations
  * built-in error recovery, and informative error messages
  * labels that are maintained across all transformations

Considerable effort has been put into making plyr fast and memory
efficient, and in many cases plyr is as fast as, or faster than, the
built-in functions.

You can find out more at http://had.co.nz/plyr/, including a 20 page
introductory guide, http://had.co.nz/plyr/plyr-intro.pdf.  You can ask
questions about plyr (and data-manipulation in general) on the plyr
mailing list. Sign up at http://groups.google.com/group/manipulatr

Version 1.4 (2011-01-03)

* `count` now takes an additional parameter `wt_var` which allows you to
  compute weighted sums. This is as fast, or faster than, `tapply` or `xtabs`.

* Really fix bug in `names.quoted`

* `.` now captures the environment in which it was evaluated. This should fix
  an esoteric class of bugs which no-one probably ever encountered, but will
  form the basis for an improved version of `ggplot2::aes`.

Version 1.3.1 (2010-12-30)

* Fix bug in `names.quoted` that interfered with ggplot2

Version 1.3 (2010-12-28)


* new function `mutate` that works like transform to add new columns or
  overwrite existing columns, but computes new columns iteratively so later
  transformations can use columns created by earlier transformations. (It's
  also about 10x faster) (Fixes #21)


* split column names are no longer coerced to valid R names.

* `quickdf` now adds names if missing

* `summarise` preserves variable names if explicit names not provided (Fixes

* `arrays` with names should be sorted correctly once again (also fixed a bug
  in the test case that prevented me from catching this automatically)

* `m_ply` no longer possesses .parallel argument (mistakenly added)

* `ldply` (and hence `adply` and `ddply`) now correctly passes on .parallel
  argument (Fixes #16)

* `id` uses a better strategy for converting to integers, making it possible
  to use for cases with larger potential numbers of combinations

Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University

More information about the R-packages mailing list