[R] Multiple regressions with changing dependent variable and time span
arun
smartpink111 at yahoo.com
Sat Nov 30 21:12:43 CET 2013
Hi,
I was able to read the file after saving it as .csv. It seems to work without any errors.
dat1<-read.csv("Book2.csv", header=T)
###same as previous
lst1 <- lapply(paste("r",1:334,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x]))
lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
library(zoo)
res1 <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); c(coef(l1), pval=summary(l1)$coef[,4], rsquare=summary(l1)$r.squared) } else rep(NA,9)},by.column=FALSE,align="right")))
row.names(res1) <- rep(paste("r",1:334,sep="."),each=123)
dim(res1)
#[1] 41082 9
#vif
library(car)
res2 <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); vif(l1) } else rep(NA,3)},by.column=FALSE,align="right")))
row.names(res2) <- rep(paste("r",1:334,sep="."),each=123)
dim(res2)
#[1] 41082 3
#DW statistic:
lst3 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); durbinWatsonTest(l1) } else rep(NA,4)},by.column=FALSE,align="right"))
res3 <- do.call(rbind,lapply(lst3,function(x) x[,-4]))
row.names(res3) <- rep(paste("r",1:334,sep="."),each=123)
dim(res3)
#[1] 41082 3
##ncvTest()
f4 <- function(meanmod, dta, varmod) {
assign(".dta", dta, envir=.GlobalEnv)
assign(".meanmod", meanmod, envir=.GlobalEnv)
m1 <- lm(.meanmod, .dta)
ans <- ncvTest(m1, varmod)
remove(".dta", envir=.GlobalEnv)
remove(".meanmod", envir=.GlobalEnv)
ans
}
lst4 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-f4(r~.,z1) } else NA},by.column=FALSE,align="right"))
names(lst4) <- paste("r",1:334,sep=".")
length(lst4)
#[1] 334
###jarque.bera.test
library(tseries)
res5 <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); resid <- residuals(l1); unlist(jarque.bera.test(resid)[1:3]) } else rep(NA,3)},by.column=FALSE,align="right")))
dim(res5)
#[1] 41082 3
A.K.
On Saturday, November 30, 2013 1:44 PM, nooldor <nooldor at gmail.com> wrote:
here is in .xlsx should be easy to open and eventually find&replace commas according to you excel settings (or maybe it will do it automatically)
On 30 November 2013 19:15, arun <smartpink111 at yahoo.com> wrote:
I tried that, but:
>
>
>
>dat1<-read.table("Book2.csv", head=T, sep=";", dec=",")
>> str(dat1)
>'data.frame': 154 obs. of 1 variable:
>
>Then I changed to:
>dat1<-read.table("Book2.csv", head=T, sep="\t", dec=",")
>> str(dat1)
>'data.frame': 154 obs. of 661 variables:
>Both of them are wrong as the number of variables should be 337.
>A.K.
>
>
>
>
>
>
>
>On Saturday, November 30, 2013 12:53 PM, nooldor <nooldor at gmail.com> wrote:
>
>Thank you,
>
>I got your reply. I am just testing your script. I will let you know how is it soon.
>
>.csv could be problematic as commas are used as dec separator (Eastern Europe excel settings) ... I read it in R with this:
>dat1<-read.table("Book2.csv", head=T, sep=";", dec=",")
>
>Thank you very much !!!
>
>T.S.
>
>
>
>
>On 30 November 2013 18:39, arun <smartpink111 at yahoo.com> wrote:
>
>I couldn't read the "Book.csv" as the format is completely messed up. Anyway, I hope the solution works on your dataset.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>On Saturday, November 30, 2013 10:34 AM, nooldor <nooldor at gmail.com> wrote:
>>
>>
>>ok.
>>
>>
>>> dat1<-read.table("Book2.csv", head=T, sep=";", dec=",") > colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:2,sep=".")) > lst1 <- lapply(paste("r",1:2,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x])) > lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} ) > sum(!!rowSums(is.na(lst2[[1]]))) [1] 57 > #[1] 40 > sapply(lst2,function(x) sum(!!rowSums(is.na(x)))) [1] 57 0 > #[1] 40 46
>>in att you have the data file
>>
>>
>>
>>
>>
>>
>>On 30 November 2013 16:22, arun <smartpink111 at yahoo.com> wrote:
>>
>>Hi,
>>>The first point is not that clear.
>>>
>>>Could you show the expected results in this case?
>>>
>>>set.seed(432)
>>>dat1 <- as.data.frame(matrix(sample(c(1:10,NA),154*5,replace=TRUE),ncol=5))
>>> colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:2,sep="."))
>>>lst1 <- lapply(paste("r",1:2,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x]))
>>>
>>>
>>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
>>> sum(!!rowSums(is.na(lst2[[1]])))
>>>#[1] 40
>>> sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
>>>#[1] 40 46
>>>
>>>
>>>A.K.
>>>
>>>
>>>
>>>On Saturday, November 30, 2013 10:09 AM, nooldor <nooldor at gmail.com> wrote:
>>>
>>>Hi,
>>>
>>>Thanks for reply!
>>>
>>>
>>>Three things:
>>>1.
>>>I did not write that some of the data has more then 31 NA in the column and then it is not possible to run lm()
>>>
>>>Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) casesIn this case program should return "NA" symbol and go further, in the case when length of the observations is shorter then 31 program should always return "NA" but go further .
>>>
>>>
>>>
>>>2. in your result matrix there are only 4 columns (for estimates of the coefficients), is it possible to put there 4 more columns with p-values and one column with R squared
>>>
>>>
>>>3. basic statistical test for the regressions:
>>>
>>>inflation factors can be captured by:
>>>res2 <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z)
>>> vif(lm(r~ F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>>>
>>>and DW statistic:
>>>res3 <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z)
>>> durbinWatsonTest(lm(r~ F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>>>
>>>
>>>3a)is that right?
>>>
>>>3b) how to do and have in user-friendly form durbinWatsonTest for more then 1 lag?
>>>
>>>3c) how to apply: jarque.bera.test from library(tseries) and ncvTest from library(car) ???
>>>
>>>
>>>
>>>
>>>
>>>
>>>Pozdrowienia,
>>>
>>>Tomasz Schabek
>>>
>>>
>>>On 30 November 2013 07:42, arun <smartpink111 at yahoo.com> wrote:
>>>
>>>Hi,
>>>>The link seems to be not working. From the description, it looks like:
>>>>set.seed(432)
>>>>dat1 <- as.data.frame(matrix(sample(200,154*337,replace=TRUE),ncol=337))
>>>> colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:334,sep="."))
>>>>lst1 <- lapply(paste("r",1:334,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x]))
>>>>
>>>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
>>>>library(zoo)
>>>>
>>>>res <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) coef(lm(r~ F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>>>>
>>>>row.names(res) <- rep(paste("r",1:334,sep="."),each=123)
>>>> dim(res)
>>>>#[1] 41082 4
>>>>
>>>>coef(lm(r.1~F.1+F.2+F.3,data=dat1[1:32,]) )
>>>>#(Intercept) F.1 F.2 F.3
>>>>#109.9168150 -0.1705361 -0.1028231 0.2027911
>>>>coef(lm(r.1~F.1+F.2+F.3,data=dat1[2:33,]) )
>>>>#(Intercept) F.1 F.2 F.3
>>>>#119.3718949 -0.1660709 -0.2059830 0.1338608
>>>>res[1:2,]
>>>># (Intercept) F.1 F.2 F.3
>>>>#r.1 109.9168 -0.1705361 -0.1028231 0.2027911
>>>>#r.1 119.3719 -0.1660709 -0.2059830 0.1338608
>>>>
>>>>A.K.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On Friday, November 29, 2013 6:43 PM, nooldor <nooldor at gmail.com> wrote:
>>>>Hi all!
>>>>
>>>>
>>>>I am just starting my adventure with R, so excuse me naive questions.
>>>>
>>>>My data look like that:
>>>>
>>>><http://r.789695.n4.nabble.com/file/n4681391/data_descr_img.jpg>
>>>>
>>>>I have 3 independent variables (F.1, F.2 and F.3) and 334 other variables
>>>>(r.1, r.2, ... r.334) - each one of these will be dependent variable in my
>>>>regression.
>>>>
>>>>Total span of the time is 154 observations. But I would like to have rolling
>>>>window regression with length of 31 observations.
>>>>
>>>>I would like to run script like that:
>>>>
>>>>summary(lm(r.1~F.1+F.2+F.3, data=data))
>>>>vif(lm(r.1~F.1+F.2+F.3, data=data))
>>>>
>>>>But for each of 334 (r.1 to r.334) dependent variables separately and with
>>>>rolling-window of the length 31obs.
>>>>
>>>>Id est:
>>>>summary(lm(r.1~F.1+F.2+F.3, data=data)) would be run 123 (154 total obs -
>>>>31. for the first regression) times for rolling-fixed period of 31 obs.
>>>>
>>>>The next regression would be:
>>>>summary(lm(r.2~F.1+F.2+F.3, data=data)) also 123 times ... and so on till
>>>>summary(lm(r.334~F.1+F.2+F.3, data=data))
>>>>
>>>>It means it would be 123 x 334 regressions (=41082 regressions)
>>>>
>>>>I would like to save results (summary + vif test) of all those 41082
>>>>regressions in one read-user-friendly file like this given by e.g command
>>>>capture.output()
>>>>
>>>>Could you help with it?
>>>>
>>>>Regards,
>>>>
>>>>T.S.
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>>______________________________________________
>>>>R-help at r-project.org mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>
>
More information about the R-help
mailing list