[R] (no subject)
Richard A. O'Keefe
ok at cs.otago.ac.nz
Fri Nov 12 03:04:02 CET 2004
On 11-Nov-04 Wei Yang wrote:
> I have a list of numbers. For each of the numbers, I take
> sum of squares of the numbers centered on the number chosen.
> If it is less than a certain constant, I will take the
> average of the numbers chosen.
Assuming I've understood this correctly, one approach is
mean(v[k > sapply(v, function (x) sum((v-x)^2))])
where v is the vector of numbers
and k is the "certain constant".
However, this formulation requires O(length(v)^2) time,
which means that it is not a particularly efficient way to do it.
What to me is far more interesting is WHY is this calculation to be done?
If you think about it, if v is sorted, the "k > sapply(...)" part will be
FALSE... TRUE... FALSE...
so this is an arithmetic mean of a "central" subset of values. Why not just
use an ordinary trimmed mean (see ?mean to find out about the trim= argument)?
Or an M-estimator if some other robust estimate of location is wanted?
I ask this in all seriousness, because in the few quick experiments I tried,
this estimator was _further_ from the population mean than the classical mean.
Is that the point of it?
More information about the R-help