[R] Looping an lapply linear regression function

Fri Sep 6 18:03:50 CEST 2013

HI,
Using the example dataset (Test_data.csv):
dat1<- read.csv("Test_data.csv",header=TRUE,sep="\t",row.names=1)
indx2<-expand.grid(names(dat1),names(dat1),stringsAsFactors=FALSE) 
indx2New<- indx2[indx2[,1]!=indx2[,2],] 
res2<-t(sapply(seq_len(nrow(indx2New)),function(i) {x1<- indx2New[i,]; x2<-cbind(dat1[x1[,1]],dat1[x1[,2]]);summary(lm(x2[,1]~x2[,2]))$coef[,4]}))
 dat2<- cbind(indx2New,value=res2[,2])
library(reshape2)
res2New<- dcast(dat2,Var1~Var2,value.var="value")
row.names(res2New)<- res2New[,1]
 res2New<- as.matrix(res2New[,-1])
 dim(res2New)
#[1] 28 28
head(res2New,3)
#            AgriEmi   AgriMach  AgriValAd     AgrVaGDP       AIL     ALAre
#AgriEmi          NA 0.23401895 0.45697412 4.644877e-01 0.6398030 0.4039855
#AgriMach  0.2340189         NA 0.01449519 4.922558e-06 0.3890046 0.9279044
#AgriValAd 0.4569741 0.01449519         NA 5.135269e-02 0.5325943 0.4872555
#              ALPer          ANS     AraLa  AraLaPer    CombusRen      ForArea
#AgriEmi   0.4039855 2.507257e-01 0.2303275 0.2303275 0.9438409125 0.0004473563
#AgriMach  0.9279044 6.072123e-05 0.3154370 0.3154370 0.0040254771 0.2590309747
#AgriValAd 0.4872555 2.060412e-01 0.8449600 0.8449600 0.0008077264 0.5152352072
#             ForArePer  ForProTon ForProTonSKm      ForRen          GDP
#AgriEmi   0.0004473563 0.01714768 0.0007089448 0.900222038 0.6022470671
#AgriMach  0.2590309748 0.20170800 0.2305335762 0.005584703 0.4199684378
#AgriValAd 0.5152352071 0.80983446 0.4368256400 0.208975126 0.0003534226
#                   GEF GroAgriProVal PermaCrop  RoadDens   RoadTot  RurPopGro
#AgriEmi   0.0008580856    0.01078593 0.6863110 0.6398030 0.6398030 0.40734903
#AgriMach  0.1315182244    0.14074612 0.2530378 0.3064186 0.3064186 0.33705434
#AgriValAd 0.7520803684    0.31556633 0.1151395 0.4374599 0.4374599 0.04837586
#          RurPopPerc    TerrPA         Trac      Vehi WaterWith
#AgriEmi    0.4835676 0.4504239 2.279566e-01 0.6398030 0.3056195
#AgriMach   0.6401556 0.1707857 4.730759e-33 0.3064186 0.9502553
#AgriValAd  0.2383507 0.0223124 1.513169e-02 0.1251843 0.3307148

#or
res3<-xtabs(value~Var1+Var2,data=dat2) #here the diagonals are "0"s
 attr(res3,"class")<- NULL
 attr(res3,"call")<-NULL
names(dimnames(res3))<-NULL

#You can change it in the first solution also.
 res2New<- dcast(dat2,Var1~Var2,value.var="value",fill=0)
row.names(res2New)<- res2New[,1]
 res2New<- as.matrix(res2New[,-1])
 identical(res2New,res3)
#[1] TRUE

A.K.

Arun, 

That does exactly what I wanted to do, but how would I 
manipulate into a matrix where the indepedent variable was on the x and 
dependent on y, or vice versa, rather than a 736, 2 matrix 

    V1   V2   V3   V4   V5...Vn 
V1 - 

V2       - 

V3              - 

V4                    -   

V5                          - 

Vn                               - 

----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: R help <r-help at r-project.org>
Cc: 
Sent: Thursday, September 5, 2013 12:49 PM
Subject: Re: Looping an lapply linear regression function

HI,
May be this helps:
 set.seed(28)
 dat1<- setNames(as.data.frame(matrix(sample(1:40,10*5,replace=TRUE),ncol=5)),letters[1:5])
indx<-as.data.frame(combn(names(dat1),2),stringsAsFactors=FALSE)
res<-t(sapply(indx,function(x) {x1<-cbind(dat1[x[1]],dat1[x[2]]);summary(lm(x1[,1]~x1[,2]))$coef[,4]}))
 rownames(res)<-apply(indx,2,paste,collapse="_")
 colnames(res)[2]<- "Coef1"
 head(res,3)
#    (Intercept)     Coef1
#a_b  0.39862676 0.8365606
#a_c  0.02427885 0.6094141
#a_d  0.37521423 0.7578723

#permutation
indx2<-expand.grid(names(dat1),names(dat1),stringsAsFactors=FALSE)
#or
indx2<- expand.grid(rep(list(names(dat1)),2),stringsAsFactors=FALSE)
indx2New<- indx2[indx2[,1]!=indx2[,2],]
res2<-t(sapply(seq_len(nrow(indx2New)),function(i) {x1<- indx2New[i,]; x2<-cbind(dat1[x1[,1]],dat1[x1[,2]]);summary(lm(x2[,1]~x2[,2]))$coef[,4]}))
row.names(res2)<-apply(indx2New,1,paste,collapse="_")
 colnames(res2)<- colnames(res)

A.K.

Hi everyone, 

First off just like to say thanks to everyone´s contributions. 
Up until now, I´ve never had to post as I´ve always found the answers 
from trawling through the database. I´ve finally managed to stump 
myself, and although for someone out there, I´m sure the answer to my 
problem is fairly simple, I, however have spent the whole day infront of
my computer struggling. I know I´ll probably get an absolute ribbing 
for making a basic mistake, or not understanding something fully, but 
I´m blind to the mistake now after looking so long at it. 

What I´m looking to do, is formulate a matrix ([28,28]) of 
p-values produced from running linear regressions of 28 variables 
against themselves (eg a~b, a~c, a~d.....b~a, b~c etc...), if that makes
sense. I´ve managed to get this to work if I just input each variable 
by hand, but this isn´t going to help when I have to make 20 matrices. 

My script is as follows; 

for (j in [1:28]) 
{ 
 ##This section works perfectly, if I don´t try to loop it, I know 
this wont work at the moment, because I haven´t designated what j is, 
but I´m showing to highlight what I´m attempting to do.   

   models <- lapply(varlist, function(x) { 
    lm(substitute(ANS ~ i, list(i = as.name(x))), data = con.i) 
  }) 

          abc<- lapply(models, function(f) summary(f)$coefficients[,4]) 

          abc<- do.call(rbind, abc) 

} 

I get the following error when I try to loop it... 

Error in model.frame.default(formula = substitute(j ~ i, list(i = as.name(x))),  : 
  variable lengths differ (found for 'ANS') ##ÄNS being my first variable 

All variables are of the same length, with 21 recordings for each 

If anyone can suggest a method of looping, or another means 
or producing ´models´ for each of my 28 variables, without having to do 
it by hand that would be fantastic. 

Thanks in advance!!