[R] Linear Regression with 2 grouping variables

Dennis Murphy djmuser at gmail.com
Tue Aug 23 07:12:10 CEST 2011


Hi:

You're kind of on the right track, but there is no conditioning
formula in lm(); it's not lattice :)  This is relatively easy to do
with the plyr package, though:

library('plyr')
# Generate a list of models - the subsetting variables (Site, Vial) are
# used to generate the data splits and the function is run on a generic
# data split d, assumed to be a data frame.
mlist <- dlply(feed1, .(Site, Vial), function(d) lm(lnRFU ~ Time, data = d))

# To get the set of model coefficients, take the list mlist as the input
# data (argument 1) and then for each generic component 'm', extract its
# coefficients:
ldply(mlist, function(m) coef(m))

For your test data, only Vial varies - the example code below reflects that:

> mlist <- dlply(feed1, .(Vial), function(d) lm(lnRFU ~ Time, data = d))
> length(mlist)  # three component model objects
[1] 3
> ldply(mlist, function(m) coef(m))
  Vial (Intercept)         Time
1    1    10.75440 -0.001508621
2    2    10.83171 -0.005095100
3    3    10.81087 -0.004897600

This idea can be generalized: if you want to pull out similar pieces
of output from each model, run ldply() on the list of models and
create a utility function that outputs, for a generic model object m,
what you want to have returned. Common choices include R^2 values,
tables of coefficients (as lists instead of data frames) or residuals
and predicted values. The game is to write the function so that it
takes a [list] model object (here, m) as input and a data frame as
output. You can also extract output from summary(m) in a similar way,
using m as the input object.

HTH,
Dennis

On Mon, Aug 22, 2011 at 6:15 PM, Nathan Miller <natemiller77 at gmail.com> wrote:
> Hi all,
>
> I have a data set that looks a bit like this.
>
> feed1
>      RFU Site Vial Time       lnRFU
> 1   44448    1    1   10  10.702075
> 2   47521    1    1   20  10.768927
> 3   42905    1    1   30  10.66674
> 4   46867    1    1   40  10.755069
> 5   42995    1    1   50  10.668839
> 6   43074    1    1   60  10.670675
> 7   41195    1    1   70  10.626072
> 8   47090    1    2   10  10.759816
> 9   48100    1    2   20  10.781037
> 10  43215    1    2   30  10.673943
> 11  39656    1    2   40  10.587998
> 12  38799    1    2   50  10.566150
> 13  38424    1    2   60 10.556438
> 14 35240 1 2 70  10.469937
> 15  46427    1    3   10  10.745636
> 16 46418 1 3 20  10.745443
> 17  42095    1    3   30  10.647684
> ......
> There are 5 columns of data, three levels of "Site", 10 "Vials" per site,
> and measurements were taken at 10 min intervals from 10-70.. I am primarily
> interested in the relationship between "Time" and "lnRFU" to calculate the
> rate at which lnRFU declines over time. I have a nice plot using a ggplot2
> code that looks like this
>
> p<-ggplot(data=feed1,aes(x=Time,y=lnRFU))
> p+geom_point(size=4)+facet_grid(Site~Vial)+geom_smooth(method="lm")
>
> The graph is useful to visualize the changes over time and grouped by both
> Site and Vial, but I also need the slopes of the linear regressions for each
> Vial, within a Site. This is where I run into a problem. I want to run a
> linear regression of lnRFU as a function of Time grouped by both Site and
> Vial. Its easy to visualize this comparison in ggplot using facet_grid(),
> but I'm not sure how to do a similar comparison/analysis within lm()
>
> I imagine something like
>
> fit<-lm(lnRFU~Time | Vial * Site, data=feed1)
>
>  in which I group by both Vial and Site, but obviously this code doesn't
> work. Does anyone have an idea for how to do a linear regression with two
> grouping variables? Do I have to go back and combine Vial and Site into a
> single grouping variable or can I leave the dataframe the way it is? I'm
> trying to imagine a means of accomplishing the same type of thing that
> facet_grid does when it allows you to plot the data as a function of two
> "grouping" variables.
>
> Thanks for you time. I greatly appreciate it.
>
> Nate Miller
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list