[R] Percentiles with R for a big data.frame

David Winsemius dwinsemius at comcast.net
Fri Jan 25 16:23:24 CET 2013


On Jan 23, 2013, at 5:45 AM, Simonas Kecorius wrote:

> I found a code:
>
> y.ts <- ts(data, frequency=12)
> aggregate(y.ts, FUN=quantile, probs=0.10)
>
> Seems it works fine even for a big data.frame.

Except for the fact that 'y.ts' is not a dataframe, so you are using a  
function that has different arguments than `aggregate.data.frame`.  
With the `ts` call you implicitly constructed  `ts(data.matrix(data),  
frequency=12)` and will be getting quantile estimates on groups of 12,  
which is not at all what you asked for in the first place.

-- 
David.
>
> Thanks for your help.
>
> 2013/1/22 David Winsemius <dwinsemius at comcast.net>
>
> On Jan 22, 2013, at 5:58 AM, Simonas Kecorius wrote:
>
> Hey Duncan,
>
> Neither me do imagine what formula OpenOffice uses for quantiles. I  
> have
> checked a data string, 24 values, to calculate a quantiles with  
> OpenOffice
> and R. The result is identical. The problem arises when I try to  
> implement
> quantile calculation in this form:
> dat2<-with(dat1,aggregate(cbind(dat1[, 
> 1:71]),by=list(newID),quantiles,0.1,type=4))
> . This code does not generate an error, but I guess neither a right  
> result.
>
> You guess? What result and what is "right"?
>
>
> So my question would be:
> How I could calculate quantiles for a big data.frame in R (71  
> columns and
> 288 rows). I need to take 24 rows, calculate quantiles, then take  
> another
>
> 24 rows etc..for 71 columns.
>
>
> You have already been told that you are misspelling the name of the  
> R function.
>
> The other open question in my mind is whether you were hoping for  
> something other than a single quantile (in this case the 10th  
> percentile, or perhaps wanted the quantiles that would divide your  
> data into deciles?
>
> If you want to do the calculation within groups then the second  
> argument to `aggregate` must specify the grouping. By design  
> `aggregate` will apply the function on all columns.
> -- 
> David.
>
> Thanks in advance.
>
>
>
>
> 2013/1/22 Duncan Murdoch <murdoch.duncan at gmail.com>
>
> On 13-01-21 6:41 PM, Simonas Kecorius wrote:
>
> Dear R users,
>
> I came up to a problem dealing with percentiles in R.
>
> From my previous questions: I do have a big data.frame, with lots of
>
> columns and rows. The following command enables me to calculate  
> means for
> all data frame.
>
> dat1$newID<-rep(1:(nrow(dat1)/**12),each=12) #if nrow(dat1)/12 is  
> integer
>
> dat2<-with(dat1,aggregate(**cbind(dat1[,1:71]),by=list(**newID),mean))
>
>
> What I need is to calculate percentiles for each group (there are 12
> values
> in a group). I tried the following:
>
> duomenai<-with(dat1,aggregate(**cbind(dat1[,1:71]),by=list(**
> newID),quantiles,0.1,type=4))
>
>
> You didn't define quantiles, so that won't work.  Assuming that's a  
> typo,
> and you meant quantile...
>
>
>
> First, is the following syntax is right?
> Secondly, I tried to calculate percentiles using OpenOffice and  
> there is
> disagreement between values. If I do calculation for some number  
> row, than
> R and OpenOffice numbers coincide, but for a data.frame it seams that
> something goes wrong.
>
>
> There are lots of different formulas for empirical quantiles.  The  
> ones
> available in R are described in the ?quantile help topic.  What  
> formula
> does OpenOffice use?
>
> Duncan Murdoch
>
>
>
>
> -- 
> Simonas Kecorius
> **
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Alameda, CA, USA
>
>
>
>
> -- 
> Simonas Kecorius
>

David Winsemius, MD
Alameda, CA, USA



More information about the R-help mailing list