[R] Median on Aggregated data
William Dunlap
wdunlap at tibco.com
Wed Nov 18 23:20:57 CET 2009
You could use S+. Its median function has
a weights argument. E.g.,
> median(c(1,2,3,4e4), weights=c(1e8,1e8,1,2e8))
[1] 3
> median(c(1,2,3,4e4), weights=c(1e8,1e8,1,2e8+10))
[1] 40000
> median(c(1,2,3,4e4), weights=c(1e8,1e8,1,2e8+1))
[1] 20001.5
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Satsangi,
> Vivek (GE Capital)
> Sent: Wednesday, November 18, 2009 1:55 PM
> To: r-help at r-project.org
> Subject: [R] Median on Aggregated data
>
> Folks,
>
> I have the following code, that works fine on smaller data sets. For
> larger datasets, it runs out of memory and runs way too slow
> because we
> are essentially creating large vectors with rep() and then calling
> median() on it. (I learned this approach from a post on the web).
>
> Below that, I have written the corresponding SAS code. The SAS code
> works fast because I can just tell the proc summary (by the weights
> option) that the Counts variable is a frequency.
>
> So, the question is, is there a simple way to do the same
> thing in R? I
> have to run this on a large dataset -- for a small set it is not a
> problem.
>
>
> ---------------------- Begin R code
> ------------------------------------
> N <- 1005 * 14;
> myNorm <- data.frame(PaydexNormingCategory = numeric(N),
> SIC = numeric(N), CatMedian = numeric(N));
>
> k=1;
> #j = 7941; ## For testing only
> for (j in levels(SIC)){
> for (i in levels(PaydexNormingCategory)){
> myData <- dfpaydex[(Paydex==i) & (SIC==j),];
> myMedian <- with(myData,
> levels(Paydex)[median(rep(as.numeric(Paydex),
> Counts))]);
> myNorm[k] <-c( as.numeric(i), as.numeric(j), as.numeric(myMedian) );
> k <- k+1;
> }
> }
>
> ---------------------- Begin SAS code
> ------------------------------------
>
> proc summary data=SASUser.PaydexNormfull nway;
>
> class PaydexNormingCategory SIC ;
> weight Counts;
> var Paydex;
>
> output out=outstat (drop=_type_ _freq_)
> median= / autoname;
> run;
>
> ---------------------- End SAS code
> ------------------------------------
>
> Thanks for your guidance!
>
>
> Vivek Satsangi
> GE Capital
> Americas
>
> GE imagination at work
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list