[R] How to do knn regression?
Hans W. Borchers
hwborchers at gmail.com
Sun Sep 28 17:23:31 CEST 2008
> This is a summary of discussions between Shengqiao Li and me,
> entered here as a reference for future requests on knn regression
> or missing value imputation based on a nearest neighbor approach.
There several functions that can be used for 'nearest neighbor'
classification such as knn, knn1 (in package class), knn3(caret),
kknn(kknn), ipredknn(ipred), sknn(klaR), or gknn(cba).
To utilize these functions for 'nearest neighbor' regression would be
difficult. There is actually just one knn-like functions that can be
applied to continuous variables:
kknn(kknn)
uses a formula and looks at the type of the target variable:
if the target variable is continuous will return a regression
result for each row in the learning set
And two implementations of functions that simply return the indices
and distances of k nearest neighbors for further processing:
ann(yaImpute)
constructs kd- or bd-trees to find k nearest neighbors
and returns indices and distances of those neighbors
(it may kill the whole R process when matrices are too big)
[Remark: Watch out, default distance is sum of squares]
knnFinder(knnFinder)
constructs a kd-tree to find the k nearest neighbors;
has too many bugs and quirks to make it almost unusable;
not maintained anymore (perhaps should be removed from CRAN)
The other approach is to use a distance function and sort 'manually'
to find the nearest neighbors and their values for the target variable.
'dist' itself is not really appropriate as it can only be applied to
_one_ matrix where here we need something like dist(A, B). Combining
A and B into one matrix is often forbidden as it needs too much memory.
dists(cba)
computes a distance matrix between rows of two matrices
can be a bit slow for very big matrices (slower than 'dist')
[Rem: default distance is square root of sum of squares]
I would appreciate to hear from you when I missed something.
Hans Werner Borchers
ABB Corporate Research
More information about the R-help
mailing list