[R] Multiple Multivariate regression in R with 50 independent variables

William Dunlap wdunlap at tibco.com
Fri Apr 19 21:30:03 CEST 2013


To avoid the formula handling bug in lm/model.matrix/etc., you can try making the
formula shorter.  E.g., if you know the names of your response columns,
   responseCols <- c("X1", "X2", "X3", ..., "X2395")
try the formula
   as.matrix(d[, responseCols]) ~ d[,"R_M_F"] + d[,"SMB"] + d[,"HML"] + d[,"WML"]
and do not use data=d in the call to lm().

You may also prefer to use lm.fit(), which takes the response matrix and design matrix
directly, so you avoid formulae altogether.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Nilesh Gupta
> Sent: Friday, April 19, 2013 10:16 AM
> To: David Winsemius
> Cc: r-help at r-project.org; peter dalgaard
> Subject: Re: [R] Multiple Multivariate regression in R with 50 independent variables
> 
> cbind( X1,X2,X3,X4,X5,X6,X7, ...  ,X2393,X2394,X2395 )~ R_M_F+SMB+HML+WML
> 
> I ran this code in the formula for i wanted to regress 2395 stocks for 500
> months each on the three independent variables.
> I got this error . The idea was to run multivriate regressions on each of
> these stocks.
> 
> Error in model.matrix.default(mt, mf, contrasts) :
>   model frame and formula mismatch in model.matrix()
> 
> 
> Googling this error led me to that page and I now know that i mistakenly
> assumed that lm was limited to 50 variables.
> 
> Is doing cbind(variables name) was the way to formulate  multivariate
> regressions.?
> 
> Where am i going wrong ?
> 
> 
> The woods are lovely, dark and deep
> But I have promises to keep
> And miles before I go to sleep
> And miles before I go to sleep
> -----
> 
> 
> On Fri, Apr 19, 2013 at 10:18 PM, David Winsemius <dwinsemius at comcast.net>wrote:
> 
> >
> > On Apr 19, 2013, at 4:51 AM, Nilesh Gupta wrote:
> >
> > > I used this link
> > >
> > http://r.789695.n4.nabble.com/model-frame-and-formula-mismatch-in-model-matrix-
> td4664093.html
> >
> > But you said 50 independent variables, and that was probably someone's
> > (failed) effort  to submit 50 _dependent_ variables. What is the real
> > problem?
> >
> > --
> > David.
> >
> >
> > > Regards
> > >
> > > The woods are lovely, dark and deep
> > > But I have promises to keep
> > > And miles before I go to sleep
> > > And miles before I go to sleep
> > > -----
> > >
> > >
> > > On Fri, Apr 19, 2013 at 2:19 PM, David Winsemius <dwinsemius at comcast.net>
> > wrote:
> > >
> > > On Apr 19, 2013, at 12:40 AM, Nilesh Gupta wrote:
> > >
> > > > lm() does not accomodate more than 50 independent variables
> > >
> > > What is your source for this misinformation?
> > >
> > > > dat <- as.data.frame(matrix(rnorm(51000), ncol=51) )
> > > > names(dat)
> > >  [1] "V1"  "V2"  "V3"  "V4"  "V5"  "V6"  "V7"  "V8"  "V9"  "V10" "V11"
> > "V12" "V13" "V14" "V15" "V16" "V17" "V18"
> > > [19] "V19" "V20" "V21" "V22" "V23" "V24" "V25" "V26" "V27" "V28" "V29"
> > "V30" "V31" "V32" "V33" "V34" "V35" "V36"
> > > [37] "V37" "V38" "V39" "V40" "V41" "V42" "V43" "V44" "V45" "V46" "V47"
> > "V48" "V49" "V50" "V51"
> > > > lm(V1 ~ ., dat=dat)
> > >
> > > Call:
> > > lm(formula = V1 ~ ., data = dat)
> > >
> > > Coefficients:
> > > (Intercept)           V2           V3           V4           V5
> >   V6           V7           V8
> > >  -0.0089517   -0.0427225   -0.0754946   -0.0002903   -0.0083482
> >  0.0324383   -0.0194980   -0.0151008
> > >          V9          V10          V11          V12          V13
> >  V14          V15          V16
> > >   0.0255324   -0.0167399    0.0476841   -0.0222229    0.0720990
> > -0.0174327   -0.0104261    0.0024625
> > >         V17          V18          V19          V20          V21
> >  V22          V23          V24
> > >  -0.0086276   -0.0274867   -0.0345897    0.0209116    0.0368201
> > -0.0027364    0.0090916    0.0198854
> > >         V25          V26          V27          V28          V29
> >  V30          V31          V32
> > >  -0.0083732   -0.0216937    0.0586361   -0.0530041    0.0402765
> >  0.0073514    0.0295976   -0.0641553
> > >         V33          V34          V35          V36          V37
> >  V38          V39          V40
> > >   0.0491071   -0.0261259    0.0364740    0.0070261   -0.0159851
> > -0.0373357    0.0506756   -0.0383495
> > >         V41          V42          V43          V44          V45
> >  V46          V47          V48
> > >   0.0054945    0.0089468   -0.0050151   -0.0184369    0.0019926
> > -0.0177631    0.0282828    0.0353523
> > >         V49          V50          V51
> > >  -0.0382634    0.0545654    0.0101398
> > >
> > > > dat <- as.data.frame(matrix(rnorm(101000), ncol=101) )
> > > > lm(V1 ~ ., dat=dat)
> > >
> > > Call:
> > > lm(formula = V1 ~ ., data = dat)
> > >
> > > Coefficients:
> > > (Intercept)           V2           V3           V4           V5
> >   V6           V7           V8
> > >    0.021065    -0.015988    -0.008273     0.049849     0.014874
> > 0.012352    -0.054584     0.004542
> > >          V9          V10          V11          V12          V13
> >  V14          V15          V16
> > >   -0.017186     0.018006    -0.009707    -0.007382     0.044886
> >  -0.051122    -0.026910    -0.048929
> > >         V17          V18          V19          V20          V21
> >  V22          V23          V24
> > >   -0.008129     0.022129    -0.063525     0.026683     0.013424
> >  -0.010145    -0.046046     0.024025
> > >         V25          V26          V27          V28          V29
> >  V30          V31          V32
> > >   -0.003529    -0.038270     0.043657     0.049855     0.010691
> > 0.041217    -0.012596     0.018302
> > >         V33          V34          V35          V36          V37
> >  V38          V39          V40
> > >    0.040225    -0.012751    -0.062677    -0.002810    -0.002574
> >  -0.024137     0.021324    -0.041520
> > >         V41          V42          V43          V44          V45
> >  V46          V47          V48
> > >   -0.076482     0.009063     0.067097    -0.042554    -0.013789
> > 0.002865     0.017325    -0.076860
> > >         V49          V50          V51          V52          V53
> >  V54          V55          V56
> > >   -0.007003    -0.007315     0.030270     0.022066    -0.002224
> >  -0.056534     0.013705    -0.003609
> > >         V57          V58          V59          V60          V61
> >  V62          V63          V64
> > >   -0.044580    -0.037543     0.015745     0.035250    -0.017117
> > 0.072470     0.004398    -0.015923
> > >         V65          V66          V67          V68          V69
> >  V70          V71          V72
> > >    0.012864    -0.062752    -0.038437    -0.019586     0.019871
> >  -0.068398    -0.111778     0.021416
> > >         V73          V74          V75          V76          V77
> >  V78          V79          V80
> > >    0.036849    -0.009103     0.037790     0.021883    -0.034990
> >  -0.014917    -0.003854     0.001760
> > >         V81          V82          V83          V84          V85
> >  V86          V87          V88
> > >   -0.001812     0.003942     0.021810    -0.013984    -0.030446
> > 0.049187     0.008392     0.026965
> > >         V89          V90          V91          V92          V93
> >  V94          V95          V96
> > >    0.057301     0.004190     0.055505    -0.046006    -0.019080
> >  -0.098889    -0.010891    -0.002729
> > >         V97          V98          V99         V100         V101
> > >    0.024939    -0.029847     0.063578    -0.061667    -0.022163
> > >
> > > > system.time( lm(V1 ~ ., dat=dat) ) # with the 101 column dataframe
> > >    user  system elapsed
> > >   0.060   0.008   0.076
> > >
> > > Sorry to give you such a Frost-y reception, but you are being somewhat
> > ... what's the right word... sleepy?
> > >
> > > --
> > > David.
> > >
> > >
> > > >
> > > > The woods are lovely, dark and deep
> > > > But I have promises to keep
> > > > And miles before I go to sleep
> > > > And miles before I go to sleep
> > > > -----
> > > >
> > > >
> > > > On Fri, Apr 19, 2013 at 12:26 PM, peter dalgaard <pdalgd at gmail.com>
> > wrote:
> > > >
> > > >>
> > > >> On Apr 18, 2013, at 21:24 , Nilesh Gupta wrote:
> > > >>
> > > >>> Hello all
> > > >>>
> > > >>> Is there a method/package in R in which I can do regressions for more
> > > >> than
> > > >>> 50 independent variables ?
> > > >>
> > > >> What's wrong with lm() et al.?
> > > >>
> > > >> --
> > >
> > >
> > > David Winsemius
> > > Alameda, CA, USA
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list