[R] variable selections to avoid multicollinearity

Bert Gunter gunter.berton at gene.com
Mon May 18 01:16:02 CEST 2015


OFFTOPIC! This is a statistical question, not an R question. Post on a
statistics site like stats.stackexchange.com  .

However, your post suggests that you are completely out of your depth
here (0/1 responses suggest that glm modeling via logistic regression
is called for). Remote internet advice is unlikely to fill the gap
between what you seem to need and what you seem to know. I strongly
suggest you find a local statistical expert to help if you wish to
avoid producing nonsense.

(Once you have figured out what you need to do, questions about how to
use R tools to do it are of course appropriate).

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Sun, May 17, 2015 at 1:06 PM, Kristi Glover
<kristi.glover at hotmail.com> wrote:
> HI R user,
> I was trying to reduce my independent variables before I run models. I have a dependent variable as a present or TRUE only (no Absence or False) whereas I have more than 20 independent variables but they are highly correlated. I was trying to reduce the independent variables . I found  PCA for feature  selection are used.
> but for the PCA feature selection, I realized that it used dependent variable (as a linear model) with independent variables to select the variables based on variation explained. But, for me , the dependent data are only "1". Therefore, I could not run it.
>
> would you give me some suggestions on how I reduce the variables into a certain numbers ? I have attached a sample data. In this data set, the dependent variable is "sp" and other 20 variables are the independent variables
>
> dat<-structure(list(sp = c(1L, 1L, 1L, 1L, 1L), var1 = c(32L, 222L,
> 134L, 114L, 121L), var2 = c(188L, 175L, 167L, 166L, 167L), var3 = c(123L,
> 129L, 136L, 138L, 137L), var4 = c(40L, 35L, 37L, 38L, 37L), var5 = c(6756L,
> 8080L, 7856L, 7899L, 7891L), var6 = c(334L, 352L, 341L, 340L,
> 341L), var7 = c(29L, -9L, -18L, -22L, -20L), var8 = c(305L, 361L,
> 359L, 362L, 361L), var9 = c(108L, 217L, 167L, 166L, 166L), var10 = c(237L,
> 67L, 61L, 59L, 60L), var11 = c(270L, 276L, 265L, 264L, 264L),
>     var12 = c(97L, 67L, 61L, 59L, 60L), var13 = c(1491L, 916L,
>     1245L, 1282L, 1250L), var14 = c(168L, 127L, 154L, 155L, 154L
>     ), var15 = c(99L, 43L, 67L, 70L, 68L), var16 = c(15L, 32L,
>     22L, 21L, 21L), var17 = c(432L, 313L, 390L, 400L, 392L),
>     var18 = c(308L, 148L, 254L, 269L, 257L), var19 = c(332L,
>     213L, 269L, 277L, 271L), var20 = c(430L, 148L, 254L, 269L,
>     257L)), .Names = c("sp", "var1", "var2", "var3", "var4",
> "var5", "var6", "var7", "var8", "var9", "var10", "var11", "var12",
> "var13", "var14", "var15", "var16", "var17", "var18", "var19",
> "var20"), class = "data.frame", row.names = c(NA, -5L))
>
> thanks
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list