[R] working with summarized data

Rick Bischoff rdbisch at gmail.com
Wed Aug 30 16:27:58 CEST 2006


The data sets I am working with all have a weight variable--e.g.,  
each row doesn't mean 1 observation.

With that in mind, nearly all of the graphs and summary statistics  
are incorrect for my data, because they don't take into account the  
weight.

****
For example "median" is incorrect, as the quantiles aren't calculated  
with weights:

sum( weights[X < median(X)] ) / sum(weights)

This should be 0.5... of course it's not.
****

Unfortunately, it seems that most(all?) of R's graphics and summary  
statistic functions don't take a weight or frequency argument.    
(Fortunately the models do...)

Am I completely missing how to do this?  One way would be to  
replicate each row proportional to the weight (e.g. if the weight was  
4, we would 3 additional copies) but this will get prohibitive pretty  
quickly as the dataset grows.


Thanks in advance!



More information about the R-help mailing list