[R] knn - random result although use.all=TRUE

David L Carlson dcarlson at tamu.edu
Fri Nov 20 16:40:29 CET 2015


Changing your definition of cl to clase let me replicate the problem. If you set a random seed just before running knn() the results are consistent so that indicates that the function is drawing a random number at some point. 

You should probably contact the package maintainer, but your toy data set is trivially simple. You have 40 total observations, but X1 has only 3 different values and X2 has only 2 different values so there are only 6 different combinations. The distance matrix on your training set has 435 distances, but only 5 different values! As a result there are many, many tied values so the algorithm probably uses a random method of selecting which 3 to use.

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of itziar irigoien
Sent: Thursday, November 19, 2015 7:10 AM
To: r-help at r-project.org
Subject: [R] knn - random result although use.all=TRUE

Dear all,

I have this toy example to work with k-nn classification approach. (My 
data, code and results are at the end of the message)
Working with knn function in library class and setting the parameter 
use.all=TRUE, I would not expect a random answer. Nevertheless I get a 
different answer each time I apply it. Could anyone help me finding out 
what is going on?

Thanks,

Itziar Irigoien

# Generate data
n <- 40
n1 <- 16
n2 <- n-n1
cl <- rep(1:2, c(n1, n2))
set.seed(37)
X1 <- sample(1:3, n, replace=TRUE, prob=rep(1/3, 3))
set.seed(36)
aux1 <- sample(1:2, n1, replace=TRUE, prob=c(0.9, 0.1))
set.seed(38)
aux2 <- sample(1:2, n2, replace=TRUE, prob=c(0.2, 0.8))
X2 <- c(aux1, aux2)
X2 <- X2+3
X2[3] <- 5

#Select training and testing sets
set.seed(36)
t <- sample(1:40, 30, replace=FALSE)
train <- cbind(X1[t], X2[t])
test <- cbind(X1[-t], X2[-t])
out <- knn(train, test, clase[t], k=3, l=0, use.all=TRUE, prob=TRUE)
table(out, clase[-t])
sum(diag(table(out, clase[-t])))/10

# Results I obtained
 > out <- knn(train, test, clase[t], k=3, l=0, use.all=TRUE, prob=TRUE)
 > table(out, clase[-t])

out 1 2
   1 1 2
   2 0 7
 > sum(diag(table(out, clase[-t])))/10
[1] 0.8


 > out <- knn(train, test, clase[t], k=3, l=0, use.all=TRUE, prob=TRUE)
 > table(out, clase[-t])

out 1 2
   1 1 4
   2 0 5
 > sum(diag(table(out, clase[-t])))/10
[1] 0.6

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list