[R] Using a by() function to process several regression (lm()) functions

Fri Nov 6 17:25:58 CET 2009

On Thu, Nov 5, 2009 at 11:15 PM, Marc Los Huertos <mloshuertos at csumb.edu> wrote:
> Hi Charlie,
>
> Wow, I like this approach and see the problem my list of lm objects.  It
> does not work well. You have created a list of the values of interest, which
> seems obvious in hindsight, but still extracting the values with the
> do.call(rbind()) bit is certainly outside my experience.
>
> I'll have to look at the do.call() and see if I can backward engineer what
> it is doing...always more to do!  :-)

do.call() is an incredibly useful function-- I was able to do data
processing much more efficiently after I found it.  Basically, it
takes two arguments-- a function and a list.  The function is called
and the list is used as the arguments.  Since rbind() takes an
arbitrarily long list of objects, using do.call and rbind() or cbind()
is a quick way to collapse a list into a matrix or data.frame.  If the
function takes named arguments, such as pf(), do.call will match the
names in the list with the names of the arguments-- this is the reason
for all the monkey business I pulled by:

  1. Extracting the parameters of the F-distribution from a summary of
the linear model.
  2. Converting them from a vector to a list.
  3. Renaming them so that they matched the arguments to pf()

> Another suggestion include this to extract the p-value,
> anova(linMod)$'Pr(>F)'[1], which seems more straight forward. Do you see any
> reason why this should be a problem?  It seems to work fine when I inserted
> it into your code.

This looks like a much more efficient method!

> However, the plyr() package seems best to solve the other problem of trying
> to extract my date and site information, which I need to run the rest of the
> analysis (i.e. the treatment difference, which is what the point is!).  I am
> disappointed I didn't find it after a few hours of searching, but that is
> another issue.  Do you have any idea why the function has the syntax that
> include dots for each argument, e.g. .data and .fun. I am sure there is some
> logic, but I didn't find a reference in the help. Perhaps, this convention
> is not important, but it does beg the question for me...

I believe this is just the convention that Hadley decided to use in
the plyr package.  Another incredibly useful package of his that you
may want to check out for data processing is 'reshape'.  It is based
on plyr and uses some of the same conventions.  You can find good
documentation, examples and papers concerning his R packages on his
website:

  http://had.co.nz/

> Thank you very much! I appreciate the diverse set of solutions, I am sure
> I'll find use for each of them...
>
> cheers, marc

No problem, have fun busting that data apart!

-Charlie