[R] linear regression for grouped data

entropy entropy053 at gmail.com
Wed Dec 29 07:15:19 CET 2010

Thanks alot for the quick responses.
I have some additional questions related to this topic. In fact, my
intention was to be able to answer questions like what percent of the
regressions have p_values less than a certain threshold, how do
residuals look like, how do the plots of y vs. x look like, etc.
I tried the following commands and found that the second line (and
similar ones) does not work for extracting certain statistics.

regress=lapply(split(egfr, as.factor(egfr$P_ID)), function(df)
{anova(lm(VALUE ~ LAB_DT, data=df)) })
regress[1]$residuals; regress[1]$fstatistic[1]

So, is it possible to record statistics of each regression such as
p_value, F-value, residuals, etc. as a vector?


On Dec 28, 6:23 pm, Entropi ntrp <entropy... at gmail.com> wrote:
> Hi,
> I have been examining large data and need to do simple linear regression
> with the data which is grouped based on the values of a particular
> attribute. For instance, consider three columns : ID, x, y,  and  I need to
> regress x on y for each distinct value of ID. Specifically, for the set of
> data corresponding to each of the 4 values of ID (76,111,121,168) in the
> below data, I should invoke linear regression 4 times. The challenge is
> that, the length of the ID vector is around 20000 and therefore linear
> regression must be done automatically for each distinct value of ID.
>                ID            x                     y
>  76 36476 15.8  76 36493 66.9  76 36579 65.6  111 35465 10.3  111 35756 4.8
> 121 38183 16  121 38184 15  121 38254 9.6  121 38255 7  168 37727 21.9  168
> 37739 29.7  168 37746 97.4
> I was wondering whether there is an easy way to group data based on the
> values of ID in R  so that linear regression can be done easily for each
> group determined by each value of ID. Or, is the only way to construct
> loops  with 'for' or 'while'  in which a matrix is generated for each
> distinct value of ID  that stores corresponding values of x and y by
> screening the entire ID vector?
> Thanks in advance,
> Yasin
>         [[alternative HTML version deleted]]
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list