[R] Median on Aggregated data

William Dunlap wdunlap at tibco.com
Wed Nov 18 23:20:57 CET 2009


You could use S+.  Its median function has
a weights argument.  E.g.,
   > median(c(1,2,3,4e4), weights=c(1e8,1e8,1,2e8))
   [1] 3
   > median(c(1,2,3,4e4),  weights=c(1e8,1e8,1,2e8+10))
   [1] 40000
   > median(c(1,2,3,4e4),  weights=c(1e8,1e8,1,2e8+1))
   [1] 20001.5

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Satsangi, 
> Vivek (GE Capital)
> Sent: Wednesday, November 18, 2009 1:55 PM
> To: r-help at r-project.org
> Subject: [R] Median on Aggregated data
> 
> Folks,
>  
> I have the following code, that works fine on smaller data sets. For
> larger datasets, it runs out of memory and runs way too slow 
> because we
> are essentially creating large vectors with rep() and then calling
> median() on it. (I learned this approach from a post on the web). 
>  
> Below that, I have written the corresponding SAS code. The SAS code
> works fast because I can just tell the proc summary (by the weights
> option) that the Counts variable is a frequency.
>  
> So, the question is, is there a simple way to do the same 
> thing in R? I
> have to run this on a large dataset -- for a small set it is not a
> problem.
>  
>  
> ---------------------- Begin R code 
> ------------------------------------
> N <- 1005 * 14; 
> myNorm <- data.frame(PaydexNormingCategory = numeric(N),
>     SIC = numeric(N), CatMedian = numeric(N));
>  
> k=1;
> #j = 7941;  ## For testing only
> for (j in levels(SIC)){
>  for (i in levels(PaydexNormingCategory)){
>  myData <- dfpaydex[(Paydex==i) & (SIC==j),];
>  myMedian <- with(myData, 
> levels(Paydex)[median(rep(as.numeric(Paydex),
> Counts))]);
>  myNorm[k] <-c( as.numeric(i), as.numeric(j), as.numeric(myMedian) );
>  k <- k+1;
>  }
> }
>  
> ---------------------- Begin SAS code
> ------------------------------------
> 
> proc summary data=SASUser.PaydexNormfull nway; 
> 
>    class PaydexNormingCategory SIC ;
>    weight Counts;
>   var Paydex;
> 
>  output out=outstat (drop=_type_ _freq_)    
>         median= / autoname;                   
>  run;
> 
> ---------------------- End SAS code 
> ------------------------------------
> 
> Thanks for your guidance!
> 
> 
> Vivek Satsangi
> GE Capital
> Americas
> 
> GE imagination at work
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 




More information about the R-help mailing list