[R] Multiple Multivariate regression in R with 50 independent variables

peter dalgaard pdalgd at gmail.com
Fri Apr 19 19:51:04 CEST 2013


On Apr 19, 2013, at 19:15 , Nilesh Gupta wrote:

> cbind( X1,X2,X3,X4,X5,X6,X7, ...  ,X2393,X2394,X2395 )~ R_M_F+SMB+HML+WML
> 
> I ran this code in the formula for i wanted to regress 2395 stocks for 500 months each on the three independent variables. 
> I got this error . The idea was to run multivriate regressions on each of these stocks. 
> 
> Error in model.matrix.default(mt, mf, contrasts) : 
>   model frame and formula mismatch in model.matrix()
> 
> 
> Googling this error led me to that page and I now know that i mistakenly assumed that lm was limited to 50 variables. 
> 
> Is doing cbind(variables name) was the way to formulate  multivariate regressions.?
> 
> Where am i going wrong ?
> 

lm() is unhappy about long expressions (this is arguably a bug), so avoid them:

M <- cbind( X1,X2,X3,X4,X5,X6,X7, ...  ,X2393,X2394,X2395 )
lm(M ~ R_M_F+SMB+HML+WML)

Notice, though, that multivariate tests will be unhappy if you have more variables than degrees of freedom (M wider than tall, essentially). 

That's a theory issue, not an lm one.

> 
> The woods are lovely, dark and deep
> But I have promises to keep 
> And miles before I go to sleep
> And miles before I go to sleep
> -----
> 
> 
> On Fri, Apr 19, 2013 at 10:18 PM, David Winsemius <dwinsemius at comcast.net> wrote:
> 
> On Apr 19, 2013, at 4:51 AM, Nilesh Gupta wrote:
> 
> > I used this link
> > http://r.789695.n4.nabble.com/model-frame-and-formula-mismatch-in-model-matrix-td4664093.html
> 
> But you said 50 independent variables, and that was probably someone's (failed) effort  to submit 50 _dependent_ variables. What is the real problem?
> 
> --
> David.
> 
> 
> > Regards
> >
> > The woods are lovely, dark and deep
> > But I have promises to keep
> > And miles before I go to sleep
> > And miles before I go to sleep
> > -----
> >
> >
> > On Fri, Apr 19, 2013 at 2:19 PM, David Winsemius <dwinsemius at comcast.net> wrote:
> >
> > On Apr 19, 2013, at 12:40 AM, Nilesh Gupta wrote:
> >
> > > lm() does not accomodate more than 50 independent variables
> >
> > What is your source for this misinformation?
> >
> > > dat <- as.data.frame(matrix(rnorm(51000), ncol=51) )
> > > names(dat)
> >  [1] "V1"  "V2"  "V3"  "V4"  "V5"  "V6"  "V7"  "V8"  "V9"  "V10" "V11" "V12" "V13" "V14" "V15" "V16" "V17" "V18"
> > [19] "V19" "V20" "V21" "V22" "V23" "V24" "V25" "V26" "V27" "V28" "V29" "V30" "V31" "V32" "V33" "V34" "V35" "V36"
> > [37] "V37" "V38" "V39" "V40" "V41" "V42" "V43" "V44" "V45" "V46" "V47" "V48" "V49" "V50" "V51"
> > > lm(V1 ~ ., dat=dat)
> >
> > Call:
> > lm(formula = V1 ~ ., data = dat)
> >
> > Coefficients:
> > (Intercept)           V2           V3           V4           V5           V6           V7           V8
> >  -0.0089517   -0.0427225   -0.0754946   -0.0002903   -0.0083482    0.0324383   -0.0194980   -0.0151008
> >          V9          V10          V11          V12          V13          V14          V15          V16
> >   0.0255324   -0.0167399    0.0476841   -0.0222229    0.0720990   -0.0174327   -0.0104261    0.0024625
> >         V17          V18          V19          V20          V21          V22          V23          V24
> >  -0.0086276   -0.0274867   -0.0345897    0.0209116    0.0368201   -0.0027364    0.0090916    0.0198854
> >         V25          V26          V27          V28          V29          V30          V31          V32
> >  -0.0083732   -0.0216937    0.0586361   -0.0530041    0.0402765    0.0073514    0.0295976   -0.0641553
> >         V33          V34          V35          V36          V37          V38          V39          V40
> >   0.0491071   -0.0261259    0.0364740    0.0070261   -0.0159851   -0.0373357    0.0506756   -0.0383495
> >         V41          V42          V43          V44          V45          V46          V47          V48
> >   0.0054945    0.0089468   -0.0050151   -0.0184369    0.0019926   -0.0177631    0.0282828    0.0353523
> >         V49          V50          V51
> >  -0.0382634    0.0545654    0.0101398
> >
> > > dat <- as.data.frame(matrix(rnorm(101000), ncol=101) )
> > > lm(V1 ~ ., dat=dat)
> >
> > Call:
> > lm(formula = V1 ~ ., data = dat)
> >
> > Coefficients:
> > (Intercept)           V2           V3           V4           V5           V6           V7           V8
> >    0.021065    -0.015988    -0.008273     0.049849     0.014874     0.012352    -0.054584     0.004542
> >          V9          V10          V11          V12          V13          V14          V15          V16
> >   -0.017186     0.018006    -0.009707    -0.007382     0.044886    -0.051122    -0.026910    -0.048929
> >         V17          V18          V19          V20          V21          V22          V23          V24
> >   -0.008129     0.022129    -0.063525     0.026683     0.013424    -0.010145    -0.046046     0.024025
> >         V25          V26          V27          V28          V29          V30          V31          V32
> >   -0.003529    -0.038270     0.043657     0.049855     0.010691     0.041217    -0.012596     0.018302
> >         V33          V34          V35          V36          V37          V38          V39          V40
> >    0.040225    -0.012751    -0.062677    -0.002810    -0.002574    -0.024137     0.021324    -0.041520
> >         V41          V42          V43          V44          V45          V46          V47          V48
> >   -0.076482     0.009063     0.067097    -0.042554    -0.013789     0.002865     0.017325    -0.076860
> >         V49          V50          V51          V52          V53          V54          V55          V56
> >   -0.007003    -0.007315     0.030270     0.022066    -0.002224    -0.056534     0.013705    -0.003609
> >         V57          V58          V59          V60          V61          V62          V63          V64
> >   -0.044580    -0.037543     0.015745     0.035250    -0.017117     0.072470     0.004398    -0.015923
> >         V65          V66          V67          V68          V69          V70          V71          V72
> >    0.012864    -0.062752    -0.038437    -0.019586     0.019871    -0.068398    -0.111778     0.021416
> >         V73          V74          V75          V76          V77          V78          V79          V80
> >    0.036849    -0.009103     0.037790     0.021883    -0.034990    -0.014917    -0.003854     0.001760
> >         V81          V82          V83          V84          V85          V86          V87          V88
> >   -0.001812     0.003942     0.021810    -0.013984    -0.030446     0.049187     0.008392     0.026965
> >         V89          V90          V91          V92          V93          V94          V95          V96
> >    0.057301     0.004190     0.055505    -0.046006    -0.019080    -0.098889    -0.010891    -0.002729
> >         V97          V98          V99         V100         V101
> >    0.024939    -0.029847     0.063578    -0.061667    -0.022163
> >
> > > system.time( lm(V1 ~ ., dat=dat) ) # with the 101 column dataframe
> >    user  system elapsed
> >   0.060   0.008   0.076
> >
> > Sorry to give you such a Frost-y reception, but you are being somewhat ... what's the right word... sleepy?
> >
> > --
> > David.
> >
> >
> > >
> > > The woods are lovely, dark and deep
> > > But I have promises to keep
> > > And miles before I go to sleep
> > > And miles before I go to sleep
> > > -----
> > >
> > >
> > > On Fri, Apr 19, 2013 at 12:26 PM, peter dalgaard <pdalgd at gmail.com> wrote:
> > >
> > >>
> > >> On Apr 18, 2013, at 21:24 , Nilesh Gupta wrote:
> > >>
> > >>> Hello all
> > >>>
> > >>> Is there a method/package in R in which I can do regressions for more
> > >> than
> > >>> 50 independent variables ?
> > >>
> > >> What's wrong with lm() et al.?
> > >>
> > >> --
> >
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> David Winsemius
> Alameda, CA, USA
> 
> 
>         [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list