[R] working with summarized data
Rick Bischoff
rdbisch at gmail.com
Wed Aug 30 16:27:58 CEST 2006
The data sets I am working with all have a weight variable--e.g.,
each row doesn't mean 1 observation.
With that in mind, nearly all of the graphs and summary statistics
are incorrect for my data, because they don't take into account the
weight.
****
For example "median" is incorrect, as the quantiles aren't calculated
with weights:
sum( weights[X < median(X)] ) / sum(weights)
This should be 0.5... of course it's not.
****
Unfortunately, it seems that most(all?) of R's graphics and summary
statistic functions don't take a weight or frequency argument.
(Fortunately the models do...)
Am I completely missing how to do this? One way would be to
replicate each row proportional to the weight (e.g. if the weight was
4, we would 3 additional copies) but this will get prohibitive pretty
quickly as the dataset grows.
Thanks in advance!
More information about the R-help
mailing list