[R] variable selections to avoid multicollinearity
Kristi Glover
kristi.glover at hotmail.com
Sun May 17 22:06:25 CEST 2015
HI R user,
I was trying to reduce my independent variables before I run models. I have a dependent variable as a present or TRUE only (no Absence or False) whereas I have more than 20 independent variables but they are highly correlated. I was trying to reduce the independent variables . I found PCA for feature selection are used.
but for the PCA feature selection, I realized that it used dependent variable (as a linear model) with independent variables to select the variables based on variation explained. But, for me , the dependent data are only "1". Therefore, I could not run it.
would you give me some suggestions on how I reduce the variables into a certain numbers ? I have attached a sample data. In this data set, the dependent variable is "sp" and other 20 variables are the independent variables
dat<-structure(list(sp = c(1L, 1L, 1L, 1L, 1L), var1 = c(32L, 222L,
134L, 114L, 121L), var2 = c(188L, 175L, 167L, 166L, 167L), var3 = c(123L,
129L, 136L, 138L, 137L), var4 = c(40L, 35L, 37L, 38L, 37L), var5 = c(6756L,
8080L, 7856L, 7899L, 7891L), var6 = c(334L, 352L, 341L, 340L,
341L), var7 = c(29L, -9L, -18L, -22L, -20L), var8 = c(305L, 361L,
359L, 362L, 361L), var9 = c(108L, 217L, 167L, 166L, 166L), var10 = c(237L,
67L, 61L, 59L, 60L), var11 = c(270L, 276L, 265L, 264L, 264L),
var12 = c(97L, 67L, 61L, 59L, 60L), var13 = c(1491L, 916L,
1245L, 1282L, 1250L), var14 = c(168L, 127L, 154L, 155L, 154L
), var15 = c(99L, 43L, 67L, 70L, 68L), var16 = c(15L, 32L,
22L, 21L, 21L), var17 = c(432L, 313L, 390L, 400L, 392L),
var18 = c(308L, 148L, 254L, 269L, 257L), var19 = c(332L,
213L, 269L, 277L, 271L), var20 = c(430L, 148L, 254L, 269L,
257L)), .Names = c("sp", "var1", "var2", "var3", "var4",
"var5", "var6", "var7", "var8", "var9", "var10", "var11", "var12",
"var13", "var14", "var15", "var16", "var17", "var18", "var19",
"var20"), class = "data.frame", row.names = c(NA, -5L))
thanks
[[alternative HTML version deleted]]
More information about the R-help
mailing list