[R] linear regression for grouped data
David Winsemius
dwinsemius at comcast.net
Wed Dec 29 03:31:48 CET 2010
On Dec 28, 2010, at 9:23 PM, Entropi ntrp wrote:
> Hi,
> I have been examining large data and need to do simple linear
> regression
> with the data which is grouped based on the values of a particular
> attribute. For instance, consider three columns : ID, x, y, and I
> need to
> regress x on y for each distinct value of ID. Specifically, for the
> set of
> data corresponding to each of the 4 values of ID (76,111,121,168) in
> the
> below data, I should invoke linear regression 4 times. The challenge
> is
> that, the length of the ID vector is around 20000 and therefore linear
> regression must be done automatically for each distinct value of ID.
>
> ID x y
> 76 36476 15.8 76 36493 66.9 76 36579 65.6 111 35465 10.3 111
> 35756 4.8
> 121 38183 16 121 38184 15 121 38254 9.6 121 38255 7 168 37727
> 21.9 168
> 37739 29.7 168 37746 97.4
Let's say that is a dataframe named "indat. Try:
lapply(split(indat, as.factor(indat$ID)), function(df) {lm(y ~ x,
data=df)} )
> I was wondering whether there is an easy way to group data based on
> the
> values of ID in R so that linear regression can be done easily for
> each
> group determined by each value of ID. Or, is the only way to construct
> loops with 'for' or 'while' in which a matrix is generated for each
> distinct value of ID that stores corresponding values of x and y by
> screening the entire ID vector?
>
> Thanks in advance,
>
> Yasin
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list