[R] Knn Probability Attributes:

Greg greggallen at gmail.com
Fri Jan 16 18:49:27 CET 2015


Dear R Geniuses:

I'm a C++ and Perl, not an R System consultant, but a client wants me
to see if R can help him predict whether daily sales for some auto
parts stores will be less than, greater, or equal to the median daily
sales value.

(equal to is defined as within 2%, otherwise there would never be that
category.)  He has 27 values to predict the 3 factors, everything from
the month, the weather, the number of clerks on duty and etc., etc.

I'm using this function  P = (train[ , vars], test[,vars],  cl , k =
1, l = 0, prob = TRUE)

Train and test are 1200 and 200 vector data frames.  The cl values are
present with "test" (at this point as variable 28)

vars = c( 5, 11, 23), for example.  If I use more than 3 variables I
get severe over-fitting.

The problem is with the printing: i want to print the results in a
table that shows for test data:

 cl    prob   P             (cl is the actual class from test, P is
the returned value from knn)
------------------
actual values for vector 1
.
.
.
actual values for  vector 200
----------------------

I'm using R from a terminal command line, not a GUI.   I've tried
numerous ways of generating the table, and none work.

Thanks,

Greg Allen
SLC, Utah



More information about the R-help mailing list