Hi,
Here is description of my problem:
I have data frame that contains 110 variables (columns) and 595
observations each.
Some of the variables will be my Y-dependent variable, some will be Q, X, Z
- independent variables
I need to estimate robust regression models: Y~X+Z+Q
I want to create 4 subsets from this data frame:
(first column contains dates - and should be skipped)
Y <- columns 2:31 [30 variables/columns] (let's call them y1, y2, y3
... y30 - each of them has some specific different, unordered name, here i
just call them y1, y2 and so on, *I think of them as of vectors, e.g. y1-
is vector of variable called - in this example - "y1" with 595
observations*- some of them can be NA)
X<- columns 32:50 [19 variables/columns] (let's call them x1,x2,x3 ...
x19 - each of them has some specific different, unordered name, here i just
call them y1, yw2 and so on , I think of them as of vectors, e.g. x1- is
vector of variable called - in this example - "x1" with 154 observations
some of them can be NA)
Z<- columns 51:80 [30 variables/columns] let's call them z1, z2, z3 ...
z30 - analogously as X
Q<- columns 81:110 [30 variables/columns] let's call them q1, q2, q3 ...
q30 - analogously as X
y1 is corresponding to z1 and q1 (and so on) in regressions below:
The goal:
I want to write code that will generate 30 x 19 robust regressions (package
MASS: rlm), like that:
y*1*~x1+z*1*+q*1*
y*1*~x2+z*1*+q*1*
...
y*1*~x19+z*1*+q*1*
y*2*~x1+z*2*+q*2*
y*2*~x2+z*2*+q*2*
...
y*2*~x19+z*2*+q*2*
y*30*~x1+z*30*+q*30*
y*30*~x2+z*30*+q*30*
...
y*30*~x19+z*30*+q*30*
(as previously described y2 - means second vector in Y subset, x19 - means
19th vector in X subset, z2- 2nd vector in Z subset ... and so on)
so first vector of Y subset should be regressed on first vector of Z subset
and first vector of subset Q but with "changing" vector of X subsets ...
and so on for all 30 vectors in Y subset
- during running each of those rlm regressions, program should extract
residuals of each regression and check if ArchTest() (package: finTS)
[ArchTest(resid,*lags=5*, *demean = FALSE*)] p-value of this test is lower
then 0,05 if yes then it should estimate Garch (1,1) model described in
here:
http://stats.stackexchange.com/questions/45482/how-to-estimate-garch-in-r-exogenous-variables-in-mean-equation
then the program should check again (the same) equation with ArchTest and
if p-value is again lower then 0,05 it should apply Garch (1,2) model and
so on (applying garch(1,3), garch(1,4) and so on) till p-value from
ArchTest will be grater then 0,05, if p-val form ArchTest will be finally
grater then 0,05 program should go to next equation and repeat procedure.
In the end I would like to have one data frame as result that contains
coefficients of all of 30 x 19 regressions (there will be 30 x 19 x
4coefficients) and p-values of them.
I was thinking about solving it like that:
-creating 4 lists of names of each subsets and using lapply, but I am yet
lack of skills in R to do it myself ... especially the garch part ...
therefore I ask you for help.
Best regards and thank in advance!
T.S.
[[alternative HTML version deleted]]