[R] working with summarized data

Anupam Tyagi AnupTyagi at yahoo.com
Fri Sep 29 10:06:07 CEST 2006

Hi Rick,

I came across your posting that I had replied to. I had assumed from 
your posting that you had positive integer weights, and that you had a 
certain kind of stratified sampling. For a general case, you may want to 
look at "survey" package. Graphical representation of survey data, 
specially large surveys, is a good research issue in statistical 
graphics. R seems to be is suitable for doing this kind of work.


Anupam Tyagi wrote the following on 8/31/2006 10:40 AM:
> One solution is to simulate the population by repeating each row 
> "weight" number of times. This is inefficient. It may create a very 
> large dataset for a large sample survey. But some of graphs and other 
> things may turn out to your liking, depending upon how the functions are 
> written.
> Anupam.
> Rick Bischoff wrote the following on 8/30/2006 7:57 PM:
>> The data sets I am working with all have a weight variable--e.g.,  
>> each row doesn't mean 1 observation.
>> With that in mind, nearly all of the graphs and summary statistics  
>> are incorrect for my data, because they don't take into account the  
>> weight.
>> ****
>> For example "median" is incorrect, as the quantiles aren't calculated  
>> with weights:
>> sum( weights[X < median(X)] ) / sum(weights)
>> This should be 0.5... of course it's not.
>> ****
>> Unfortunately, it seems that most(all?) of R's graphics and summary  
>> statistic functions don't take a weight or frequency argument.    
>> (Fortunately the models do...)
>> Am I completely missing how to do this?  One way would be to  
>> replicate each row proportional to the weight (e.g. if the weight was  
>> 4, we would 3 additional copies) but this will get prohibitive pretty  
>> quickly as the dataset grows.
>> Thanks in advance!
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list