[R] working with summarized data

Anupam Tyagi AnupTyagi at yahoo.com
Fri Sep 29 10:30:35 CEST 2006


Thomas Lumley <tlumley <at> u.washington.edu> writes:

> 
> On Wed, 30 Aug 2006, Rick Bischoff wrote:
> 
> >>> Unfortunately, it seems that most(all?) of R's graphics and summary
> >>> statistic functions don't take a weight or frequency argument.
> >>> (Fortunately the models do...)
> >>
> >> I have been been meaning to add this functionality to my graphics
> >> package ggplot (http://had.co.nz/ggplot), but unfortunately haven't
> >> had time yet.  I'm guessing you want something like:
> >>
> >> * scatterplot: scale size of point according to weight (can do)
> >> * bar chart: bars should have height proportional to weight (can do)
> >> * histogram: area proportion to weighting variable (have some half
> >> finished code to do)
> >> * smoothers: should automatically use weights
> >> * boxplot: use weighted quantiles/letter statistics (is there a
> >> function for that?)
> >>
> >> What else is there?
> >
> > densityplot is the only other one I can think of at the moment...
> > With the rest of those, I could certainly live without it though!
> >
> 
> Density plots, scatterplot smoothers, hexbin plots, bubble plots, 
> histograms, and boxplots are available in the survey package. These are 
> probability-weighted rather than frequency-weighted but it doesn't matter 
> for graphics.  You could use them as is (which requires setting up a 
> survey design object) or rip the internals out of them.
> 
>  	-thomas
> 

I came across this posting that I had replied to earlier. I had assumed from the
original question that the data had positive integer weights, and that it had a
certain kind of stratified sampling. For a general case, "survey" package and
perhaps "ggplots" seem suitable to make these graphical extensions. "survey"
also takes into account survey design. I think graphical representation of
survey data,
specially large surveys, is a good research issue in statistical graphics. For
example, I am not convinced that making the area of a graphical symbol a
function of survey weight gives easily perceived and interpretable results: like
a bars in a bar-plot or histogram. Is there an implementation of graphical
functions that are conceptually similar to graphical respresntations of robust
statistics (which modify the "weights" of observations)? R seems to be suitable
for doing this kind of work.

Anupam.



More information about the R-help mailing list