[R] linear regression on groups of consecutive rows of a matrix

David Winsemius dwinsemius at comcast.net
Tue Nov 24 21:52:19 CET 2009


Perhaps along these lines:
1st  #need to decide what your group width is , so the second number  
inside the extraction call will be that number minus 1:

for (x in seq(1:1000, by=6) {
     temp <- na,omit( shp[x:(x+5), ] )  # Need the parens in x:(x+5)
     lm( formula, data=temp)
    }

Or depending on what you actually meant:

for (x in seq(1:1000, by=5) {
     temp <-  shp[ x:(x+4), which(!is.na(shp[x:x+4, ]))]
     lm( formula, data=temp)
    }

But I do feel compelled to ask: Do you really get meaningful  
information from lm applied to 5 cases? Especially when the predictors  
used may not be the same from subset to subset???

-- 
David

On Nov 24, 2009, at 3:25 PM, Jim Bouldin wrote:

>
> I want to perform linear regression on groups of consecutive rows-- 
> say 5 to
> 10 such--of two matrices.  There are many such potential groups  
> because the
> matrices have thousands of rows. The matrices are both of the form:
>
>> shp[1:5,16:20]
>      SL495B SL004C SL005C SL005A SL017A
> -2649   1.06   0.56     NA     NA     NA
> -2648   0.97   0.57     NA     NA     NA
> -2647   0.46   0.30     NA     NA     NA
> -2646   0.92   0.48     NA     NA     NA
> -2645   0.82   0.48     NA     NA     NA
>
> That is, they both have NA values, and non-NA values, in the same  
> matrix
> positions.  In my attempts so far, I have had two problems.  First, in
> using the split function (which I assume is essential here), I am  
> unable to
> split the matrices by groups of rows (say rows 1 to 5, 6 to 10, etc):
>
>> shp_split = split(shp,row(shp))
>
> will split the matrix by rows but not by groups thereof. Stumped.
>
> Second, I cannot seem to get rid of the NA values, which would  
> prevent the
> regression even is I could figure out how to split the matrices  
> correctly,
> e.g.:
>> shp_split = split(shp,row(shp))
>> shp_split = shp_split[!is.na(shp_split)]
>> shp_split[1]
> $`1`
>  [1] 0.68 0.28 0.43 0.47 0.64 0.40 0.69 0.56 0.62 0.40 1.01 0.67  
> 0.17 1.36
> 1.84 1.06 0.56   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA    
> NA   NA
>  NA   NA   NA etc
>
> IF I solve these problems, will I in fact be able to perform  
> individual
> linear regressions on the (numerous) collections of 5 to 10 rows?
>
> Thanks as always for any insight.
>
>
> Jim Bouldin
> Research Ecologist
> Department of Plant Sciences, UC Davis
> Davis CA, 95616
> 530-554-1740
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list