# [R] Linear separation

(Ted Harding) ted.harding at wlandres.net
Fri Dec 3 10:14:06 CET 2010

```On 03-Dec-10 08:34:25, soeren.vogel at eawag.ch wrote:
> In https://stat.ethz.ch/pipermail/r-help/2008-March/156868.html I found
> what linear separability means. But what can I do if I find such a
> situation in my data? Field (2005) suggest to reduce the number of
> predictors or increase the number of cases. But I am not sure whether I
> can, as an alternative, take the findings from my analysis and report
> them. And if so, how can I find the linear combination of the
> predictors that separates the zeros from the ones?
>
> Below a small example to illustrate the situation.
>
> set.seed(123)
> df <- data.frame(
>   'y'=c(rep(FALSE, 6), rep(TRUE, 14)),
>   'x1'=c(sample(1:2, 6, repl=T), sample(3:5, 14, repl=T)),
>   'x2'=c(sample(4:7, 6, repl=T), sample(1:3, 14, repl=T)),
>   'x3'=round(rnorm(20, 4, 2), 0)
> )
> df[17:18, c(2, 3)] <- df[17:18, c(3, 2)]
> glm(y ~ ., data=df[, -3], family=binomial("logit"))
> glm(y ~ ., data=df, family=binomial("logit"))
>
> Thanks, Sören

Try a Linear Discriminant Analysis, say using lda() in MASS:

library(MASS)
?lda

Then, with your example as given above:

lda(y ~ . , data=df)
# Call:
# lda(y ~ ., data = df)
# Prior probabilities of groups:
# FALSE  TRUE
#   0.3   0.7
# Group means:
#             x1       x2       x3
# FALSE 1.500000 6.333333 2.500000
# TRUE  3.928571 2.071429 4.642857
# Coefficients of linear discriminants:
#           LD1
# x1  0.2851901
# x2 -1.1018691
# x3  0.2594354

M <- as.matrix(df[,2:4])  ## The X's in your data 'df'
C <- c(0.2851901,-1.1018691,0.2594354)  ## The discriminant coefficients
cbind(M%*%C,df\$y)  ## M%*%C ie the value of the LD for each case
#             [,1] [,2]
#  [1,] -6.9090228    0
#  [2,] -5.0030928    0
#  [3,] -5.8071537    0
#  [4,] -6.3643973    0
#  [5,] -5.2625282    0
#  [6,] -6.0665891    0
#  [7,]  0.4936346    1
#  [8,]  0.2599539    1
#  [9,]  0.5577621    1
# [10,]  1.8549391    1
# [11,] -0.5824798    1
# [12,] -1.3865407    1
# [13,] -0.3230444    1
# [14,] -0.6082345    1
# [15,]  1.3103136    1
# [16,]  0.5193893    1
# [17,] -1.1528600    1
# [18,] -1.9826756    1
# [19,]  0.5320074    1
# [20,]  1.1023876    1

Showing that all cases with sufficiently low values of the LD
correspond to Y=0, and all cases with sifficiently high vales
correspond to Y=1.

Of course the LDA is not unique, given the data: the LD is simply
a linear combination LD = a1*X1 + a2*X2 + a3*X3 such that, for some
value of C, LD < C and LD > C separate the 0's from the 1's.
Clearly the value of C (given the coefficients) could be anywhere
between  -5.0030928 and -1.9826756, and still achieve the same
separation; and it would also be possible to vary the coefficients
a1, a2, a3 and still achieve the same separation.

So you might not learn a lot from this (or you may, depending on
the data).

As for what policy to adopt when linear separation occurs, that
is certainly not unqiue either! But any possible effective policy
will have one component: Think!

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 03-Dec-10                                       Time: 09:14:02
------------------------------ XFMail ------------------------------

```