[R] Multiple regressions with changing dependent variable and time span
arun
smartpink111 at yahoo.com
Sat Nov 30 18:11:17 CET 2013
Hi,
#####1 & 2:
set.seed(432)
dat1 <- as.data.frame(matrix(sample(c(1:60,NA),154*337,replace=TRUE),ncol=337))
colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:334,sep="."))
lst1 <- lapply(paste("r",1:334,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x]))
lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
library(zoo)
rollapply(lst2[[1]],width=32,FUN=function(z) {z1 <- as.data.frame(z); sum(!!rowSums(is.na(z1)))},by.column=FALSE,align="right")
#[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3
#[38] 4 4 4 4 4 4 4 4 4 3 3 3 3 4 4 4 4 4 4 4 5 5 5 4 4 4 4 4 3 3 3 3 2 2 2 2 2
#[75] 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
#[112] 2 2 2 2 2 2 2 2 2 2 2 2
res1 <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); c(coef(l1), pval=summary(l1)$coef[,4], rsquare=summary(l1)$r.squared) } else rep(NA,9)},by.column=FALSE,align="right")))
row.names(res1) <- rep(paste("r",1:334,sep="."),each=123)
dim(res1)
#[1] 41082 9
###3.
library(car)
# vif()
res2 <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); vif(l1) } else rep(NA,3)},by.column=FALSE,align="right")))
row.names(res2) <- rep(paste("r",1:334,sep="."),each=123)
dim(res2)
#[1] 41082 3
#DW statistic:
lst3 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); durbinWatsonTest(l1) } else rep(NA,4)},by.column=FALSE,align="right"))
res3 <- do.call(rbind,lapply(lst3,function(x) x[,-4]))
row.names(res3) <- rep(paste("r",1:334,sep="."),each=123)
dim(res3)
#[1] 41082 3
##ncvTest()
f4 <- function(meanmod, dta, varmod) {
assign(".dta", dta, envir=.GlobalEnv)
assign(".meanmod", meanmod, envir=.GlobalEnv)
m1 <- lm(.meanmod, .dta)
ans <- ncvTest(m1, varmod)
remove(".dta", envir=.GlobalEnv)
remove(".meanmod", envir=.GlobalEnv)
ans
}
lst4 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-f4(r~.,z1) } else NA},by.column=FALSE,align="right"))
names(lst4) <- paste("r",1:334,sep=".")
length(lst4)
#[1] 334
###jarque.bera.test
library(tseries)
res5 <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); resid <- residuals(l1); unlist(jarque.bera.test(resid)[1:3]) } else rep(NA,3)},by.column=FALSE,align="right")))
dim(res5)
#[1] 41082 3
##the lag() thing is not clear.
A.K.
On Saturday, November 30, 2013 10:09 AM, nooldor <nooldor at gmail.com> wrote:
Hi,
Thanks for reply!
Three things:
1.
I did not write that some of the data has more then 31 NA in the column and then it is not possible to run lm()
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) casesIn this case program should return "NA" symbol and go further, in the case when length of the observations is shorter then 31 program should always return "NA" but go further .
2. in your result matrix there are only 4 columns (for estimates of the coefficients), is it possible to put there 4 more columns with p-values and one column with R squared
3. basic statistical test for the regressions:
inflation factors can be captured by:
res2 <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z)
vif(lm(r~ F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
and DW statistic:
res3 <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z)
durbinWatsonTest(lm(r~ F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
3a)is that right?
3b) how to do and have in user-friendly form durbinWatsonTest for more then 1 lag?
3c) how to apply: jarque.bera.test from library(tseries) and ncvTest from library(car) ???
Pozdrowienia,
Tomasz Schabek
On 30 November 2013 07:42, arun <smartpink111 at yahoo.com> wrote:
Hi,
>The link seems to be not working. From the description, it looks like:
>set.seed(432)
>dat1 <- as.data.frame(matrix(sample(200,154*337,replace=TRUE),ncol=337))
> colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:334,sep="."))
>lst1 <- lapply(paste("r",1:334,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x]))
>
> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
>library(zoo)
>
>res <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) coef(lm(r~ F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>
>row.names(res) <- rep(paste("r",1:334,sep="."),each=123)
> dim(res)
>#[1] 41082 4
>
>coef(lm(r.1~F.1+F.2+F.3,data=dat1[1:32,]) )
>#(Intercept) F.1 F.2 F.3
>#109.9168150 -0.1705361 -0.1028231 0.2027911
>coef(lm(r.1~F.1+F.2+F.3,data=dat1[2:33,]) )
>#(Intercept) F.1 F.2 F.3
>#119.3718949 -0.1660709 -0.2059830 0.1338608
>res[1:2,]
># (Intercept) F.1 F.2 F.3
>#r.1 109.9168 -0.1705361 -0.1028231 0.2027911
>#r.1 119.3719 -0.1660709 -0.2059830 0.1338608
>
>A.K.
>
>
>
>
>
>
>On Friday, November 29, 2013 6:43 PM, nooldor <nooldor at gmail.com> wrote:
>Hi all!
>
>
>I am just starting my adventure with R, so excuse me naive questions.
>
>My data look like that:
>
><http://r.789695.n4.nabble.com/file/n4681391/data_descr_img.jpg>
>
>I have 3 independent variables (F.1, F.2 and F.3) and 334 other variables
>(r.1, r.2, ... r.334) - each one of these will be dependent variable in my
>regression.
>
>Total span of the time is 154 observations. But I would like to have rolling
>window regression with length of 31 observations.
>
>I would like to run script like that:
>
>summary(lm(r.1~F.1+F.2+F.3, data=data))
>vif(lm(r.1~F.1+F.2+F.3, data=data))
>
>But for each of 334 (r.1 to r.334) dependent variables separately and with
>rolling-window of the length 31obs.
>
>Id est:
>summary(lm(r.1~F.1+F.2+F.3, data=data)) would be run 123 (154 total obs -
>31. for the first regression) times for rolling-fixed period of 31 obs.
>
>The next regression would be:
>summary(lm(r.2~F.1+F.2+F.3, data=data)) also 123 times ... and so on till
>summary(lm(r.334~F.1+F.2+F.3, data=data))
>
>It means it would be 123 x 334 regressions (=41082 regressions)
>
>I would like to save results (summary + vif test) of all those 41082
>regressions in one read-user-friendly file like this given by e.g command
>capture.output()
>
>Could you help with it?
>
>Regards,
>
>T.S.
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list