[R] Linear separation

Fri Dec 3 09:34:25 CET 2010

In https://stat.ethz.ch/pipermail/r-help/2008-March/156868.html I found what linear separability means. But what can I do if I find such a situation in my data? Field (2005) suggest to reduce the number of predictors or increase the number of cases. But I am not sure whether I can, as an alternative, take the findings from my analysis and report them. And if so, how can I find the linear combination of the predictors that separates the zeros from the ones?

Below a small example to illustrate the situation.

set.seed(123)
df <- data.frame(
  'y'=c(rep(FALSE, 6), rep(TRUE, 14)),
  'x1'=c(sample(1:2, 6, repl=T), sample(3:5, 14, repl=T)),
  'x2'=c(sample(4:7, 6, repl=T), sample(1:3, 14, repl=T)),
  'x3'=round(rnorm(20, 4, 2), 0)
)
df[17:18, c(2, 3)] <- df[17:18, c(3, 2)]
glm(y ~ ., data=df[, -3], family=binomial("logit"))
glm(y ~ ., data=df, family=binomial("logit"))

Thanks, Sören

-- 
Sören Vogel, Dipl.-Psych. (Univ.), PhD-Student, Eawag, Dept. SIAM
http://www.eawag.ch, http://sozmod.eawag.ch