[R] multi-regression with more than 50 independent variables

David L Carlson dcarlson at tamu.edu
Mon Feb 13 17:36:44 CET 2012


You need to spend some time reading about multiple regression. In statistics
there is always what is possible and what is advisable. I'm not going to
address whether a regression of 57 independent variables is advisable, only
possible. For your data, it is not possible. The attached data contain only
13 observations so the maximum number of independent variables you can use
is 13. Consider the following example:

example <- data.frame(y=rnorm(3), x1=rnorm(3), x2=rnorm(3), x3=rnorm(3))
lm(y~x1 + x2, example)
lm(y~x1 + x2 + x3, example)

We create four variables using random normal numbers for 3 cases (rows). The
first regression (2 independent variables "works" (i.e. there are no NA's).
The second produces an NA for the third independent variable. In my example,
the three random variables are not correlated with one another. In your data
there must be correlations among the 57 variables so that you are only
getting slope values for 11.

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352



-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of R DF
Sent: Monday, February 13, 2012 9:19 AM
To: r-help at r-project.org
Subject: [R] multi-regression with more than 50 independent variables

Hi R Users,

I am going to run a multiple linear regression with around 57 independent
variables. Each time I run the model with just 11 variables, the results
are reasonable. With increasing the number of independent variables more
than 11, the coefficients will get "NA" in the output.  Is there any
limitation for the number of independent variables in multiple linear
regressions in R? I attached my dataset as well as R codes below:



mlr.data<- read.table("./multiple.txt",header=T)

mlr.output<- lm(formula = CaV ~ SHG +  TrD+  CrH+ SPAD+ FlN+ FrN+   YT+
LA+ LDMP+    B+Cu+  Zn+   Mn +   Fe+   K +  P+   N +Clay30 +Silt30 +Sand30
+Clay60 +Silt60 +Sand60 +ESP30 +NaEx30+ CEC30+Cl30+ SAR30 +KSol30+ NaSol30
+CaMgSol3 +ZnAv30 +FeAv30 +OC30 +PAv30 +KAv30 +TNV30+ pH30+ EC30 +SP30
+ESP60 +NaEx60 +CEC60  +Cl60 +SAR60 +KSol60 +NaSol60 +CaMgSol6
+ZnAv60+FeAv60 +OC60 +PAv60 +KAv60 +TNV60 +pH60 + EC60 +SP60, data=mlr.data)

summary (mlr.output)




Regards,

Reza



More information about the R-help mailing list