[R] Using pam, agnes or clara as prediction models?
Renald Buter
buter at cwts.leidenuniv.nl
Thu Jan 15 12:22:18 CET 2004
On Thu, Jan 15, 2004 at 08:59:37AM +0000, Prof Brian Ripley wrote:
[snip]
> > > > # separate the ruspini data into train and test set
> > > > > train<-ruspini[1:50,]
> > > > > test<-ruspini[51:75,]
> > > > > pamx<-pam(train,4)
> > > > > knnx<-knn(pamx$medoids,test,factor(c("a","b","c","d")),k=3)
> > > > > knnx
> > > > [1] d d b b d c b c c d c a a d c c a a c a a d c d a
> > > > Levels: a b c d
> > > >
> > > > But the result of applying the test set to the knn should only contain 2
> > > > clusters, since the upper half of the ruspini data contains only 2
> > > > clusters.
> > > >
> > > > Could you tell me what I am missing here?
[snip]
> When you divided a dataset into `training' and `testing' sets you are
> assuming an least exchangeability whereas this dataset is clearly ordered.
> So it is not credible that `train' and `test' are samples from the same
> population.
>
Thank you *very* much for your help. I thought I'd let the list know
what I did to get it right:
# create a seed vector
> seed<-rank(runif(75))
> train<-ruspini[seed[1:60],]
> test<-ruspini[seed[61:75],]
> pamx<-pam(train,4)
> knnx<-knn(pamx$medoids,test,factor(c("a","b","c","d")),k=1)
And now the result makes sense!
Thanks again,
Renald
More information about the R-help
mailing list