[R] working with summarized data
Greg Snow
Greg.Snow at intermountainmail.org
Wed Aug 30 18:28:18 CEST 2006
There are functions to do weighted summary statistics in the Hmisc
package (wtd.quantile, ...).
For more complicated analyses (but not plots yet) the biglm package has
a bigglm function that expects the data in chunks, you could write a
function that expand parts of the dataset at a time.
Hope this helps,
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Rick Bischoff
Sent: Wednesday, August 30, 2006 8:28 AM
To: r-help at stat.math.ethz.ch
Subject: [R] working with summarized data
The data sets I am working with all have a weight variable--e.g., each
row doesn't mean 1 observation.
With that in mind, nearly all of the graphs and summary statistics are
incorrect for my data, because they don't take into account the weight.
****
For example "median" is incorrect, as the quantiles aren't calculated
with weights:
sum( weights[X < median(X)] ) / sum(weights)
This should be 0.5... of course it's not.
****
Unfortunately, it seems that most(all?) of R's graphics and summary
statistic functions don't take a weight or frequency argument.
(Fortunately the models do...)
Am I completely missing how to do this? One way would be to replicate
each row proportional to the weight (e.g. if the weight was 4, we would
3 additional copies) but this will get prohibitive pretty quickly as the
dataset grows.
Thanks in advance!
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list