[R] working with summarized data
AnupTyagi at yahoo.com
Fri Sep 29 10:30:35 CEST 2006
Thomas Lumley <tlumley <at> u.washington.edu> writes:
> On Wed, 30 Aug 2006, Rick Bischoff wrote:
> >>> Unfortunately, it seems that most(all?) of R's graphics and summary
> >>> statistic functions don't take a weight or frequency argument.
> >>> (Fortunately the models do...)
> >> I have been been meaning to add this functionality to my graphics
> >> package ggplot (http://had.co.nz/ggplot), but unfortunately haven't
> >> had time yet. I'm guessing you want something like:
> >> * scatterplot: scale size of point according to weight (can do)
> >> * bar chart: bars should have height proportional to weight (can do)
> >> * histogram: area proportion to weighting variable (have some half
> >> finished code to do)
> >> * smoothers: should automatically use weights
> >> * boxplot: use weighted quantiles/letter statistics (is there a
> >> function for that?)
> >> What else is there?
> > densityplot is the only other one I can think of at the moment...
> > With the rest of those, I could certainly live without it though!
> Density plots, scatterplot smoothers, hexbin plots, bubble plots,
> histograms, and boxplots are available in the survey package. These are
> probability-weighted rather than frequency-weighted but it doesn't matter
> for graphics. You could use them as is (which requires setting up a
> survey design object) or rip the internals out of them.
I came across this posting that I had replied to earlier. I had assumed from the
original question that the data had positive integer weights, and that it had a
certain kind of stratified sampling. For a general case, "survey" package and
perhaps "ggplots" seem suitable to make these graphical extensions. "survey"
also takes into account survey design. I think graphical representation of
specially large surveys, is a good research issue in statistical graphics. For
example, I am not convinced that making the area of a graphical symbol a
function of survey weight gives easily perceived and interpretable results: like
a bars in a bar-plot or histogram. Is there an implementation of graphical
functions that are conceptually similar to graphical respresntations of robust
statistics (which modify the "weights" of observations)? R seems to be suitable
for doing this kind of work.
More information about the R-help