[R] quantile function

Frank E Harrell Jr f.harrell at vanderbilt.edu
Fri Feb 6 18:16:44 CET 2004


On Fri, 6 Feb 2004 09:30:31 -0600 (CST)
Giovanni Petris <GPetris at uark.edu> wrote:

> 
> I am trying to `cut' a continuous variable into contiguous classes
> containing approximately an equal number of observations. I thought
> quantile() was the appropriate function to use in order to find the
> breakpoints, but I end up with classes of different sizes - see
> example below. Does anybody have an explanation for that? And what is
> the `recommended' way of computing what I am looking for?
> 
> Example:
> 
> > ca$age
>  [1] 28 42 46 45 34 44 48 45 38 45 49 45 41 46 49 46 44 48 52 48 45 50
>  53 57 46
> [26] 52 54 57 47 52 55 59 50 54 57 60 51 55 46 63 51 59 48 35 53 59 57
> 37 55 32[51] 60 43 59 37 30 47 60 38 34 48 32 38 36 49 33 42 38 58 35 43
> 39 59 39 43 42[76] 60 40 44
> > table(cut(ca$age,breaks=c(-Inf,quantile(ca$age,
> > seq(0,1,length=11)[-1]))))
> 
> (-Inf,35] (35,38.4] (38.4,43]   (43,45] (45,46.5] (46.5,49]   (49,52]  
> (52,55] 
>         9         7        10         8         5        10         7   
>              7 
>   (55,59]   (59,63] 
>        10         5 
> 
> Thanks in advance,
> Giovanni
> 
> -- 
> 

The cut2 function in the Hmisc package tries to do this the best it can.

Frank

---
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list