[R] Multiple regressions with changing dependent variable and time span

arun smartpink111 at yahoo.com
Mon Dec 2 04:14:41 CET 2013


Hi,
I guess you wanted something like this:

res2 <- do.call(cbind,lapply(lst2,
 function(x)
rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); 
if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); 
c(coef(l1), pval=summary(l1)$coef[,4], rsquare=summary(l1)$r.squared) } 
else rep(NA,9)},by.column=FALSE,align="right")))
dim(res2)
#[1]  123 3006
 334*9
#[1] 3006

lst2New <- lapply(split(seq(3006),((seq(3006)-1)%%9)+1),function(x) {x1 <- res2[,x]; colnames(x1) <- paste(colnames(x1),1:334,sep="_");x1})
sapply(lst2New,ncol)
#  1   2   3   4   5   6   7   8   9 
#334 334 334 334 334 334 334 334 334 
lst2New[[1]][1:4,1:4]
#     (Intercept)_1 (Intercept)_2 (Intercept)_3 (Intercept)_4
#[1,]            NA    -0.3295765    -0.7025259            NA
#[2,]            NA     0.1217360    -1.5221660            NA
#[3,]            NA     0.3192466    -1.3586341            NA
#[4,]            NA     0.2618476    -0.9698798            NA


A.K.














On Sunday, December 1, 2013 9:33 PM, nooldor <nooldor at gmail.com> wrote:

Hi,

actually, do you remember yesterday code you build for me?
it was:

dat1<-read.csv("Book2.csv", header=T)
>###same as previous
>
>
>lst1 <- lapply(paste("r",1:334,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x]))
>lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
> sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
>library(zoo)
>res1 <- do.call(rbind,lapply(lst2,
>function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); c(coef(l1), 
pval=summary(l1)$coef[,4], rsquare=summary(l1)$r.squared) } else 
rep(NA,9)},by.column=FALSE,align="right")))
>row.names(res1) <- rep(paste("r",1:334,sep="."),each=123)
> dim(res1)
>#[1] 41082     9

(boo2.xls is  attached in this e-mail previously, just need to open it in excel and save as csv)
then we have result res1 as matrix 41082 x 9
(this 41082 = 123observations in time x 334objects)
now I need to separate this matrix res1 into 9 different matrices each one containing 123 obs x 334 variables 



maybe reshape() would be useful ...

... let me know if I described it clearly 





On 2 December 2013 03:17, arun <smartpink111 at yahoo.com> wrote:

#or
>
>data.frame(split(dat1,dat1$A))
>
>
>
>
>
>On Sunday, December 1, 2013 9:16 PM, arun <smartpink111 at yahoo.com> wrote:
>Hi,
>Try:
>dat1 <- read.table(text="A    B     C     D
>r.1  x1    x2   x3
>r.1  x4    x5    x6
>r.2  x7    x8    x9
>r.2  x10  x11 x12
>r.3  x13  x14 x15
>r.3  x16  x17 x18",header=TRUE,stringsAsFactors=FALSE)
>
> dat2 <- do.call(cbind,split(dat1,dat1$A))
>colnames(dat2) <- gsub(".*\\.","",colnames(dat2))
>A.K.
>
>
>
>
>
>
>
>On Sunday, December 1, 2013 6:32 PM, nooldor <nooldor at gmail.com> wrote:
>
>Hi,
>
>could you also tell me how to reshape the res1 matrix like that:
>
>[now]
>
>A    B     C     D
>r.1  x1    x2   x3
>r.1  x4    x5    x6
>r.2  x7    x8    x9r.2  x10  x11 x12r.3  x13  x14 x15r.3  x16  x17 x18
>
>
>[after]:
>A    B     C     D      A    B     C     D       A    B     C     D
>r.1  x1    x2   x3      r.2  x7    x8    x9      r.3  x13  x14 x15
>r.1  x4    x5   x6      r.2  x10  x11 x12      r.3  x16  x17 x18
>
>
>
>big thanks!
>
>
>
>
>
>
>
>On 30 November 2013 23:28, arun <smartpink111 at yahoo.com> wrote:
>
>Hi,
>>No problem.
>>
>>In that case, each column will be a list.  For example if I take the first element of `lst2`
>>dW1 <- rollapply(lst2[[1]],width=32,FUN=function(z) {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); durbinWatsonTest(l1,max.lag=3) } else rep(NA,4)},by.column=FALSE,align="right")
>>
>> tail(dW1[,1],1)
>>#[[1]]
>>#[1] -0.3602936  0.1975667 -0.1740797
>>
>>
>>You can store it by:
>>resdW1 <- do.call(cbind,lapply(seq_len(ncol(dW1)),function(i) do.call(rbind,dW1[,i]))[1:3])
>>
>>
>>Similarly, for more than one elements (using a subset of lst2- as it takes time)
>>
>>
>>lst3 <- lapply(lst2[1:2],function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); durbinWatsonTest(l1,max.lag=3) } else rep(NA,4)},by.column=FALSE,align="right"))
>>
>>lst3New <- lapply(lst3,function(x) do.call(cbind,lapply(seq_len(ncol(x)),function(i) do.call(rbind,x[,i]))[1:3]))
>>
>>lst3New <- lapply(lst3New, function(x) {colnames(x) <- paste0(rep(c("r","dw","p"),each=3),1:3); x})
>>
>>A.K.
>>
>>
>>On Saturday, November 30, 2013 5:03 PM, nooldor <nooldor at gmail.com> wrote:
>>
>>Hey!
>>
>>
>>Yes,
>>only the D-W test takes so much time, did not check it yet
>>
>>I checked results (estimates) with manually run regressions (in excel) and they are correct.
>>
>>
>>I only change the "width" to 31 and "each=123" to 124, cause it should be ((154-31)+1) x 334 = 41416 matrix
>>
>>
>>with the lag in D-W test I was wondering how to have table when I use durbinWatsonTest(l1,3) - with three lags instead of default 1.
>>
>>but I can manage it - just need to learn about functions used by you.
>>
>>
>>Any way: BIG THANK to you!
>>
>>
>>Best wishes,
>>T.S.
>>
>>
>>
>>
>>
>>On 30 November 2013 21:12, arun <smartpink111 at yahoo.com> wrote:
>>
>>Hi,
>>>
>>>I was able to read the file after saving it as .csv.  It seems to work without any errors.
>>>
>>>dat1<-read.csv("Book2.csv", header=T)
>>>###same as previous
>>>
>>>
>>>lst1 <- lapply(paste("r",1:334,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x]))
>>>lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
>>> sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
>>>library(zoo)
>>>
>>>res1 <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); c(coef(l1), pval=summary(l1)$coef[,4], rsquare=summary(l1)$r.squared) } else rep(NA,9)},by.column=FALSE,align="right")))
>>>row.names(res1) <- rep(paste("r",1:334,sep="."),each=123)
>>> dim(res1)
>>>#[1] 41082     9
>>>
>>>#vif
>>> library(car)
>>>
>>>res2 <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); vif(l1) } else rep(NA,3)},by.column=FALSE,align="right")))
>>>row.names(res2) <- rep(paste("r",1:334,sep="."),each=123)
>>>dim(res2)
>>>#[1] 41082     3
>>>
>>>#DW statistic:
>>> lst3 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); durbinWatsonTest(l1) } else rep(NA,4)},by.column=FALSE,align="right"))
>>> res3 <- do.call(rbind,lapply(lst3,function(x) x[,-4]))
>>>row.names(res3) <- rep(paste("r",1:334,sep="."),each=123)
>>> dim(res3)
>>>#[1] 41082     3
>>>##ncvTest()
>>>f4 <- function(meanmod, dta, varmod) {
>>>assign(".dta", dta, envir=.GlobalEnv)
>>>assign(".meanmod", meanmod, envir=.GlobalEnv)
>>>m1 <- lm(.meanmod, .dta)
>>>ans <- ncvTest(m1, varmod)
>>>remove(".dta", envir=.GlobalEnv)
>>>remove(".meanmod", envir=.GlobalEnv)
>>>ans
>>>}
>>>
>>> lst4 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-f4(r~.,z1) } else NA},by.column=FALSE,align="right"))
>>>names(lst4) <- paste("r",1:334,sep=".")
>>>length(lst4)
>>>#[1] 334
>>>
>>>
>>>###jarque.bera.test
>>>library(tseries)
>>>res5 <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); resid <- residuals(l1); unlist(jarque.bera.test(resid)[1:3]) } else rep(NA,3)},by.column=FALSE,align="right")))
>>> dim(res5)
>>>#[1] 41082     3
>>>
>>>A.K.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On Saturday, November 30, 2013 1:44 PM, nooldor <nooldor at gmail.com> wrote:
>>>
>>>here is in .xlsx should be easy to open and eventually find&replace commas according to you excel settings (or maybe it will do it automatically)
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 30 November 2013 19:15, arun <smartpink111 at yahoo.com> wrote:
>>>
>>>I tried that, but:
>>>>
>>>>
>>>>
>>>>dat1<-read.table("Book2.csv", head=T, sep=";", dec=",")
>>>>> str(dat1)
>>>>'data.frame':    154 obs. of  1 variable:
>>>>
>>>>Then I changed to:
>>>>dat1<-read.table("Book2.csv", head=T, sep="\t", dec=",")
>>>>> str(dat1)
>>>>'data.frame':    154 obs. of  661 variables:
>>>>Both of them are wrong as the number of variables should be 337.
>>>>A.K.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On Saturday, November 30, 2013 12:53 PM, nooldor <nooldor at gmail.com> wrote:
>>>>
>>>>Thank you,
>>>>
>>>>I got your reply. I am just testing your script. I will let you know how is it soon.
>>>>
>>>>.csv could be problematic as commas are used as dec separator (Eastern Europe excel settings) ... I read it in R with this:
>>>>dat1<-read.table("Book2.csv", head=T, sep=";", dec=",")
>>>>
>>>>Thank you very much !!!
>>>>
>>>>T.S.
>>>>
>>>>
>>>>
>>>>
>>>>On 30 November 2013 18:39, arun <smartpink111 at yahoo.com> wrote:
>>>>
>>>>I couldn't read the "Book.csv" as the format is completely messed up.  Anyway, I hope the solution works on your dataset.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>On Saturday, November 30, 2013 10:34 AM, nooldor <nooldor at gmail.com> wrote:
>>>>>
>>>>>
>>>>>ok.
>>>>>
>>>>>
>>>>>> dat1<-read.table("Book2.csv", head=T, sep=";", dec=",") > colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:2,sep=".")) > lst1 <- lapply(paste("r",1:2,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x])) > lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} ) > sum(!!rowSums(is.na(lst2[[1]]))) [1] 57 > #[1] 40 > sapply(lst2,function(x) sum(!!rowSums(is.na(x)))) [1] 57  0 > #[1] 40 46
>>>>>in att you have the data file
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>On 30 November 2013 16:22, arun <smartpink111 at yahoo.com> wrote:
>>>>>
>>>>>Hi,
>>>>>>The first point is not that clear.
>>>>>>
>>>>>>Could you show the expected results in this case?
>>>>>>
>>>>>>set.seed(432)
>>>>>>dat1 <- as.data.frame(matrix(sample(c(1:10,NA),154*5,replace=TRUE),ncol=5))
>>>>>> colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:2,sep="."))
>>>>>>lst1 <- lapply(paste("r",1:2,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x]))
>>>>>>
>>>>>>
>>>>>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
>>>>>> sum(!!rowSums(is.na(lst2[[1]])))
>>>>>>#[1] 40
>>>>>> sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
>>>>>>#[1] 40 46
>>>>>>
>>>>>>
>>>>>>A.K.
>>>>>>
>>>>>>
>>>>>>
>>>>>>On Saturday, November 30, 2013 10:09 AM, nooldor <nooldor at gmail.com> wrote:
>>>>>>
>>>>>>Hi,
>>>>>>
>>>>>>Thanks for reply!
>>>>>>
>>>>>>
>>>>>>Three things:
>>>>>>1.
>>>>>>I did not write that some of the data has more then 31 NA in the column and then it is not possible to run lm()
>>>>>>
>>>>>>Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :  0 (non-NA) casesIn this case program should return "NA" symbol and go further, in the case when length of the observations is shorter then 31 program should always return "NA" but go further .
>>>>>>
>>>>>>
>>>>>>
>>>>>>2. in your result matrix there are only 4 columns (for estimates of the coefficients), is it possible to put there 4 more columns with p-values and one column with R squared
>>>>>>
>>>>>>
>>>>>>3. basic statistical test for the regressions:
>>>>>>
>>>>>>inflation factors can be captured by:
>>>>>>res2 <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z)
>>>>>>  vif(lm(r~ F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>>>>>>
>>>>>>and DW statistic:
>>>>>>res3 <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z)
>>>>>>  durbinWatsonTest(lm(r~ F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>>>>>>
>>>>>>
>>>>>>3a)is that right?
>>>>>>
>>>>>>3b) how to do and have in user-friendly form durbinWatsonTest for more then 1 lag?
>>>>>>
>>>>>>3c) how to apply: jarque.bera.test from library(tseries) and ncvTest from library(car) ???
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>Pozdrowienia,
>>>>>>
>>>>>>Tomasz Schabek
>>>>>>
>>>>>>
>>>>>>On 30 November 2013 07:42, arun <smartpink111 at yahoo.com> wrote:
>>>>>>
>>>>>>Hi,
>>>>>>>The link seems to be not working.  From the description, it looks like:
>>>>>>>set.seed(432)
>>>>>>>dat1 <- as.data.frame(matrix(sample(200,154*337,replace=TRUE),ncol=337))
>>>>>>> colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:334,sep="."))
>>>>>>>lst1 <- lapply(paste("r",1:334,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x]))
>>>>>>>
>>>>>>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
>>>>>>>library(zoo)
>>>>>>>
>>>>>>>res <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) coef(lm(r~ F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>>>>>>>
>>>>>>>row.names(res) <- rep(paste("r",1:334,sep="."),each=123)
>>>>>>> dim(res)
>>>>>>>#[1] 41082     4
>>>>>>>
>>>>>>>coef(lm(r.1~F.1+F.2+F.3,data=dat1[1:32,]) )
>>>>>>>#(Intercept)         F.1         F.2         F.3
>>>>>>>#109.9168150  -0.1705361  -0.1028231   0.2027911
>>>>>>>coef(lm(r.1~F.1+F.2+F.3,data=dat1[2:33,]) )
>>>>>>>#(Intercept)         F.1         F.2         F.3
>>>>>>>#119.3718949  -0.1660709  -0.2059830   0.1338608
>>>>>>>res[1:2,]
>>>>>>>#    (Intercept)        F.1        F.2       F.3
>>>>>>>#r.1    109.9168 -0.1705361 -0.1028231 0.2027911
>>>>>>>#r.1    119.3719 -0.1660709 -0.2059830 0.1338608
>>>>>>>
>>>>>>>A.K.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>On Friday, November 29, 2013 6:43 PM, nooldor <nooldor at gmail.com> wrote:
>>>>>>>Hi all!
>>>>>>>
>>>>>>>
>>>>>>>I am just starting my adventure with R, so excuse me naive questions.
>>>>>>>
>>>>>>>My data look like that:
>>>>>>>
>>>>>>><http://r.789695.n4.nabble.com/file/n4681391/data_descr_img.jpg>
>>>>>>>
>>>>>>>I have 3 independent variables (F.1, F.2 and F.3) and 334 other variables
>>>>>>>(r.1, r.2, ... r.334) - each one of these will be dependent variable in my
>>>>>>>regression.
>>>>>>>
>>>>>>>Total span of the time is 154 observations. But I would like to have rolling
>>>>>>>window regression with length of 31 observations.
>>>>>>>
>>>>>>>I would like to run script like that:
>>>>>>>
>>>>>>>summary(lm(r.1~F.1+F.2+F.3, data=data))
>>>>>>>vif(lm(r.1~F.1+F.2+F.3, data=data))
>>>>>>>
>>>>>>>But for each of 334 (r.1 to r.334) dependent variables separately and with
>>>>>>>rolling-window of the length 31obs.
>>>>>>>
>>>>>>>Id est:
>>>>>>>summary(lm(r.1~F.1+F.2+F.3, data=data)) would be run 123 (154 total obs -
>>>>>>>31. for the first regression) times for rolling-fixed period of 31 obs.
>>>>>>>
>>>>>>>The next regression would be:
>>>>>>>summary(lm(r.2~F.1+F.2+F.3, data=data)) also 123 times ... and so on till
>>>>>>>summary(lm(r.334~F.1+F.2+F.3, data=data))
>>>>>>>
>>>>>>>It means it would be 123 x 334 regressions (=41082 regressions)
>>>>>>>
>>>>>>>I would like to save results (summary + vif test) of all those 41082
>>>>>>>regressions in one read-user-friendly file like this given by e.g command
>>>>>>>capture.output()
>>>>>>>
>>>>>>>Could you help with it?
>>>>>>>
>>>>>>>Regards,
>>>>>>>
>>>>>>>T.S.
>>>>>>>
>>>>>>>    [[alternative HTML version deleted]]
>>>>>>>
>>>>>>>______________________________________________
>>>>>>>R-help at r-project.org mailing list
>>>>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the R-help mailing list