[R] Histogram Ranking
John Day
jday at csi-inc.com
Fri Sep 6 20:30:40 CEST 2002
Hello,
This is not exactly an R question, but I suspect that there is an R
procedure that does what I am calling (for lack of a better name)
"histogram ranking".
I'm trying to evaluate a set of regression features by segregating by
target class and comparing the feature histograms. My idea is that if the
histograms are the same for two different classes then there is no
predictive power in those features. Conversely, if the histograms are
different then there is probably some predictive "juice" that we can
squeeze out of the features with regression.
The histograms are computing by partitioning the features into equally
spaced bins over their spans and counting the sample values in each bin
that corresponds to that partition of feature space. This is done for each
target class, so the resulting histograms are the features distributions
conditioned by target class.
Since the histograms are numeric vectors, we can measure the "goodness" of
a feature set by evaluating the "distance" between histograms. The bigger
the better etc.
Now I'm no statistics expert. Have I re-invented some "wheel" here? What is
the canonical name for this kind of analysis? Is this kind of analysis
routinely done in R? [Is there a "better" way to do all this?]
Thanks,
John Day
I
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list