[R] Looping an lapply linear regression function
arun
smartpink111 at yahoo.com
Fri Sep 6 18:03:50 CEST 2013
HI,
Using the example dataset (Test_data.csv):
dat1<- read.csv("Test_data.csv",header=TRUE,sep="\t",row.names=1)
indx2<-expand.grid(names(dat1),names(dat1),stringsAsFactors=FALSE)
indx2New<- indx2[indx2[,1]!=indx2[,2],]
res2<-t(sapply(seq_len(nrow(indx2New)),function(i) {x1<- indx2New[i,]; x2<-cbind(dat1[x1[,1]],dat1[x1[,2]]);summary(lm(x2[,1]~x2[,2]))$coef[,4]}))
dat2<- cbind(indx2New,value=res2[,2])
library(reshape2)
res2New<- dcast(dat2,Var1~Var2,value.var="value")
row.names(res2New)<- res2New[,1]
res2New<- as.matrix(res2New[,-1])
dim(res2New)
#[1] 28 28
head(res2New,3)
# AgriEmi AgriMach AgriValAd AgrVaGDP AIL ALAre
#AgriEmi NA 0.23401895 0.45697412 4.644877e-01 0.6398030 0.4039855
#AgriMach 0.2340189 NA 0.01449519 4.922558e-06 0.3890046 0.9279044
#AgriValAd 0.4569741 0.01449519 NA 5.135269e-02 0.5325943 0.4872555
# ALPer ANS AraLa AraLaPer CombusRen ForArea
#AgriEmi 0.4039855 2.507257e-01 0.2303275 0.2303275 0.9438409125 0.0004473563
#AgriMach 0.9279044 6.072123e-05 0.3154370 0.3154370 0.0040254771 0.2590309747
#AgriValAd 0.4872555 2.060412e-01 0.8449600 0.8449600 0.0008077264 0.5152352072
# ForArePer ForProTon ForProTonSKm ForRen GDP
#AgriEmi 0.0004473563 0.01714768 0.0007089448 0.900222038 0.6022470671
#AgriMach 0.2590309748 0.20170800 0.2305335762 0.005584703 0.4199684378
#AgriValAd 0.5152352071 0.80983446 0.4368256400 0.208975126 0.0003534226
# GEF GroAgriProVal PermaCrop RoadDens RoadTot RurPopGro
#AgriEmi 0.0008580856 0.01078593 0.6863110 0.6398030 0.6398030 0.40734903
#AgriMach 0.1315182244 0.14074612 0.2530378 0.3064186 0.3064186 0.33705434
#AgriValAd 0.7520803684 0.31556633 0.1151395 0.4374599 0.4374599 0.04837586
# RurPopPerc TerrPA Trac Vehi WaterWith
#AgriEmi 0.4835676 0.4504239 2.279566e-01 0.6398030 0.3056195
#AgriMach 0.6401556 0.1707857 4.730759e-33 0.3064186 0.9502553
#AgriValAd 0.2383507 0.0223124 1.513169e-02 0.1251843 0.3307148
#or
res3<-xtabs(value~Var1+Var2,data=dat2) #here the diagonals are "0"s
attr(res3,"class")<- NULL
attr(res3,"call")<-NULL
names(dimnames(res3))<-NULL
#You can change it in the first solution also.
res2New<- dcast(dat2,Var1~Var2,value.var="value",fill=0)
row.names(res2New)<- res2New[,1]
res2New<- as.matrix(res2New[,-1])
identical(res2New,res3)
#[1] TRUE
A.K.
Arun,
That does exactly what I wanted to do, but how would I
manipulate into a matrix where the indepedent variable was on the x and
dependent on y, or vice versa, rather than a 736, 2 matrix
V1 V2 V3 V4 V5...Vn
V1 -
V2 -
V3 -
V4 -
V5 -
Vn -
----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: R help <r-help at r-project.org>
Cc:
Sent: Thursday, September 5, 2013 12:49 PM
Subject: Re: Looping an lapply linear regression function
HI,
May be this helps:
set.seed(28)
dat1<- setNames(as.data.frame(matrix(sample(1:40,10*5,replace=TRUE),ncol=5)),letters[1:5])
indx<-as.data.frame(combn(names(dat1),2),stringsAsFactors=FALSE)
res<-t(sapply(indx,function(x) {x1<-cbind(dat1[x[1]],dat1[x[2]]);summary(lm(x1[,1]~x1[,2]))$coef[,4]}))
rownames(res)<-apply(indx,2,paste,collapse="_")
colnames(res)[2]<- "Coef1"
head(res,3)
# (Intercept) Coef1
#a_b 0.39862676 0.8365606
#a_c 0.02427885 0.6094141
#a_d 0.37521423 0.7578723
#permutation
indx2<-expand.grid(names(dat1),names(dat1),stringsAsFactors=FALSE)
#or
indx2<- expand.grid(rep(list(names(dat1)),2),stringsAsFactors=FALSE)
indx2New<- indx2[indx2[,1]!=indx2[,2],]
res2<-t(sapply(seq_len(nrow(indx2New)),function(i) {x1<- indx2New[i,]; x2<-cbind(dat1[x1[,1]],dat1[x1[,2]]);summary(lm(x2[,1]~x2[,2]))$coef[,4]}))
row.names(res2)<-apply(indx2New,1,paste,collapse="_")
colnames(res2)<- colnames(res)
A.K.
Hi everyone,
First off just like to say thanks to everyone´s contributions.
Up until now, I´ve never had to post as I´ve always found the answers
from trawling through the database. I´ve finally managed to stump
myself, and although for someone out there, I´m sure the answer to my
problem is fairly simple, I, however have spent the whole day infront of
my computer struggling. I know I´ll probably get an absolute ribbing
for making a basic mistake, or not understanding something fully, but
I´m blind to the mistake now after looking so long at it.
What I´m looking to do, is formulate a matrix ([28,28]) of
p-values produced from running linear regressions of 28 variables
against themselves (eg a~b, a~c, a~d.....b~a, b~c etc...), if that makes
sense. I´ve managed to get this to work if I just input each variable
by hand, but this isn´t going to help when I have to make 20 matrices.
My script is as follows;
for (j in [1:28])
{
##This section works perfectly, if I don´t try to loop it, I know
this wont work at the moment, because I haven´t designated what j is,
but I´m showing to highlight what I´m attempting to do.
models <- lapply(varlist, function(x) {
lm(substitute(ANS ~ i, list(i = as.name(x))), data = con.i)
})
abc<- lapply(models, function(f) summary(f)$coefficients[,4])
abc<- do.call(rbind, abc)
}
I get the following error when I try to loop it...
Error in model.frame.default(formula = substitute(j ~ i, list(i = as.name(x))), :
variable lengths differ (found for 'ANS') ##ÄNS being my first variable
All variables are of the same length, with 21 recordings for each
If anyone can suggest a method of looping, or another means
or producing ´models´ for each of my 28 variables, without having to do
it by hand that would be fantastic.
Thanks in advance!!
More information about the R-help
mailing list