[R] Regression with many independent variables
Matthew Douglas
matt.douglas01 at gmail.com
Mon Feb 28 21:32:02 CET 2011
Hi,
I am trying use lm() on some data, the code works fine but I would
like to use a more efficient way to do this.
The data looks like this (the data is very sparse with a few 1s, -1s
and the rest 0s):
> head(adj0708)
MARGIN Poss P235 P247 P703 P218 P430 P489 P83 P307 P337....
1 64.28571 29 0 0 0 0 0 0 0 0 0 0
0 0 0
2 -100.00000 6 0 0 0 0 0 0 0 1 0 0
0 0 0
3 100.00000 4 0 0 0 0 0 0 0 1 0 0
0 0 0
4 -33.33333 7 0 0 0 0 0 0 0 0 0 0
0 0 0
5 200.00000 2 0 0 0 0 0 0 0 0 0 0
-1 0 0
6 -83.33333 12 0 -1 0 0 0 0 0 0 0 0
0 0 0
adj0708 is actually a 35657x341 data set. Each column after "Poss" is
an independent variable, the dependent variable is "MARGIN" and it is
weighted by "Poss"
The regression is below:
fit.adj0708 <- lm( adj0708$MARGIN~adj0708$P235 + adj0708$P247 +
adj0708$P703 + adj0708$P430 + adj0708$P489 + adj0708$P218 +
adj0708$P605 + adj0708$P337 + .... +
adj0708$P510,weights=adj0708$Poss)
I have two questions:
1. Is there a way to to condense how I write the independent variables
in the lm(), instead of having such a long line of code (I have 339
independent variables to be exact)?
2. I would like to pair the data to look a regression of the
interactions between two independent variables. I think it would look
something like this....
fit.adj0708 <- lm( adj0708$MARGIN~adj0708$P235:adj0708$P247 +
adj0708$P703:adj0708$P430 + adj0708$P489:adj0708$P218 +
adj0708$P605:adj0708$P337 + ....,weights=adj0708$Poss)
but there will be 339 Choose 2 combinations, so a lot of independent
variables! Is there a more efficient way of writing this code. Is
there a way I can do this?
Thanks,
Matt
More information about the R-help
mailing list