[R] working with summarized data

Anupam Tyagi AnupTyagi at yahoo.com
Thu Aug 31 07:12:04 CEST 2006


One solution is to simulate the population by repeating each row
"weight" number of times. This is inefficient. It may create a very
large dataset for a large sample survey. But some of graphs and other
things may turn out to your liking, depending upon how the functions are
written.

Anupam.

Rick Bischoff wrote the following on 8/30/2006 7:57 PM:
> The data sets I am working with all have a weight variable--e.g.,  
> each row doesn't mean 1 observation.
> 
> With that in mind, nearly all of the graphs and summary statistics  
> are incorrect for my data, because they don't take into account the  
> weight.
> 
> ****
> For example "median" is incorrect, as the quantiles aren't calculated  
> with weights:
> 
> sum( weights[X < median(X)] ) / sum(weights)
> 
> This should be 0.5... of course it's not.
> ****
> 
> Unfortunately, it seems that most(all?) of R's graphics and summary  
> statistic functions don't take a weight or frequency argument.    
> (Fortunately the models do...)
> 
> Am I completely missing how to do this?  One way would be to  
> replicate each row proportional to the weight (e.g. if the weight was  
> 4, we would 3 additional copies) but this will get prohibitive pretty  
> quickly as the dataset grows.
> 
> 
> Thanks in advance!
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list