[R] linear regression for grouped data

Michael Dewey info at aghmed.fsnet.co.uk
Thu Dec 30 14:39:05 CET 2010

At 02:23 29/12/2010, Entropi ntrp wrote:
>I have been examining large data and need to do simple linear regression
>with the data which is grouped based on the values of a particular
>attribute. For instance, consider three columns : ID, x, y,  and  I need to
>regress x on y for each distinct value of ID. Specifically, for the set of
>data corresponding to each of the 4 values of ID (76,111,121,168) in the
>below data, I should invoke linear regression 4 times. The challenge is
>that, the length of the ID vector is around 20000 and therefore linear
>regression must be done automatically for each distinct value of ID.
>                ID            x                     y
>  76 36476 15.8  76 36493 66.9  76 36579 65.6  111 35465 10.3  111 35756 4.8
>121 38183 16  121 38184 15  121 38254 9.6  121 38255 7  168 37727 21.9  168
>37739 29.7  168 37746 97.4
>I was wondering whether there is an easy way to group data based on the
>values of ID in R  so that linear regression can be done easily for each
>group determined by each value of ID. Or, is the only way to construct
>loops  with 'for' or 'while'  in which a matrix is generated for each
>distinct value of ID  that stores corresponding values of x and y by
>screening the entire ID vector?

The advantage of using lmList from nlme is that
a) it gives you access to a range of functions already written to 
operate on such oblects
b) you can easily write your own extractor function and then call it 
using lapply

If you do it yourself you can still do (b) but you lose (a)

>Thanks in advance,
>         [[alternative HTML version deleted]]

Michael Dewey

More information about the R-help mailing list