[R] knn - random result although use.all=TRUE
David L Carlson
dcarlson at tamu.edu
Fri Nov 20 16:40:29 CET 2015
Changing your definition of cl to clase let me replicate the problem. If you set a random seed just before running knn() the results are consistent so that indicates that the function is drawing a random number at some point.
You should probably contact the package maintainer, but your toy data set is trivially simple. You have 40 total observations, but X1 has only 3 different values and X2 has only 2 different values so there are only 6 different combinations. The distance matrix on your training set has 435 distances, but only 5 different values! As a result there are many, many tied values so the algorithm probably uses a random method of selecting which 3 to use.
-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of itziar irigoien
Sent: Thursday, November 19, 2015 7:10 AM
To: r-help at r-project.org
Subject: [R] knn - random result although use.all=TRUE
Dear all,
I have this toy example to work with k-nn classification approach. (My
data, code and results are at the end of the message)
Working with knn function in library class and setting the parameter
use.all=TRUE, I would not expect a random answer. Nevertheless I get a
different answer each time I apply it. Could anyone help me finding out
what is going on?
Thanks,
Itziar Irigoien
# Generate data
n <- 40
n1 <- 16
n2 <- n-n1
cl <- rep(1:2, c(n1, n2))
set.seed(37)
X1 <- sample(1:3, n, replace=TRUE, prob=rep(1/3, 3))
set.seed(36)
aux1 <- sample(1:2, n1, replace=TRUE, prob=c(0.9, 0.1))
set.seed(38)
aux2 <- sample(1:2, n2, replace=TRUE, prob=c(0.2, 0.8))
X2 <- c(aux1, aux2)
X2 <- X2+3
X2[3] <- 5
#Select training and testing sets
set.seed(36)
t <- sample(1:40, 30, replace=FALSE)
train <- cbind(X1[t], X2[t])
test <- cbind(X1[-t], X2[-t])
out <- knn(train, test, clase[t], k=3, l=0, use.all=TRUE, prob=TRUE)
table(out, clase[-t])
sum(diag(table(out, clase[-t])))/10
# Results I obtained
> out <- knn(train, test, clase[t], k=3, l=0, use.all=TRUE, prob=TRUE)
> table(out, clase[-t])
out 1 2
1 1 2
2 0 7
> sum(diag(table(out, clase[-t])))/10
[1] 0.8
> out <- knn(train, test, clase[t], k=3, l=0, use.all=TRUE, prob=TRUE)
> table(out, clase[-t])
out 1 2
1 1 4
2 0 5
> sum(diag(table(out, clase[-t])))/10
[1] 0.6
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list