[R-pkgs] plyr 1.4
Hadley Wickham
hadley at rice.edu
Tue Jan 4 15:14:50 CET 2011
# plyr
plyr is a set of tools for a common set of problems: you need to
__split__ up a big data structure into homogeneous pieces, __apply__ a
function to each piece and then __combine__ all the results back
together. For example, you might want to:
* fit the same model each patient subsets of a data frame
* quickly calculate summary statistics for each group
* perform group-wise transformations like scaling or standardising
It's already possible to do this with base R functions (like split and
the apply family of functions), but plyr makes it all a bit easier
with:
* totally consistent names, arguments and outputs
* convenient parallelisation through the foreach package
* input from and output to data.frames, matrices and lists
* progress bars to keep track of long running operations
* built-in error recovery, and informative error messages
* labels that are maintained across all transformations
Considerable effort has been put into making plyr fast and memory
efficient, and in many cases plyr is as fast as, or faster than, the
built-in functions.
You can find out more at http://had.co.nz/plyr/, including a 20 page
introductory guide, http://had.co.nz/plyr/plyr-intro.pdf. You can ask
questions about plyr (and data-manipulation in general) on the plyr
mailing list. Sign up at http://groups.google.com/group/manipulatr
Version 1.4 (2011-01-03)
------------------------------------------------------------------------------
* `count` now takes an additional parameter `wt_var` which allows you to
compute weighted sums. This is as fast, or faster than, `tapply` or `xtabs`.
* Really fix bug in `names.quoted`
* `.` now captures the environment in which it was evaluated. This should fix
an esoteric class of bugs which no-one probably ever encountered, but will
form the basis for an improved version of `ggplot2::aes`.
Version 1.3.1 (2010-12-30)
------------------------------------------------------------------------------
* Fix bug in `names.quoted` that interfered with ggplot2
Version 1.3 (2010-12-28)
------------------------------------------------------------------------------
NEW FEATURES
* new function `mutate` that works like transform to add new columns or
overwrite existing columns, but computes new columns iteratively so later
transformations can use columns created by earlier transformations. (It's
also about 10x faster) (Fixes #21)
BUG FIXES
* split column names are no longer coerced to valid R names.
* `quickdf` now adds names if missing
* `summarise` preserves variable names if explicit names not provided (Fixes
#17)
* `arrays` with names should be sorted correctly once again (also fixed a bug
in the test case that prevented me from catching this automatically)
* `m_ply` no longer possesses .parallel argument (mistakenly added)
* `ldply` (and hence `adply` and `ddply`) now correctly passes on .parallel
argument (Fixes #16)
* `id` uses a better strategy for converting to integers, making it possible
to use for cases with larger potential numbers of combinations
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/
More information about the R-packages
mailing list