[R] quantile / centile
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Sat Sep 27 16:21:47 CEST 2008
Donald Braman wrote:
> Thanks, for the response!
> Unfortunately, I was unclear; my problem is not that I need to know what the
> percentile ranges are, but that I need to assign an appropriate percentile
> range to each of the records in my dataframe. My dataframe contains
> somewhere between 1000 and 9000 rows/records in my dataframe (depending on
> context), not a hundred rows. That is, I'd like to assign a corresponding
> quantile value to each row that corresponds to the quantile() result for
> each record in my 1000-9000 row data frame.
>
> Thanks again for any help!
>
>
>
You can use
cnt <- cut(x, quantile(x, seq(0,1,0.01)), include=TRUE)
names(cnt) <- 1:100 # if you want to get rid of ugly interval labels
With Harrells Hmisc packages, there's also
cnt <- cut2(x, g=100)
Or you can take a more basic approach and do
N <-sum(!is.na(x))
cnt <- ceiling(rank(x)/N*100)
> On Sat, Sep 27, 2008 at 8:54 AM, Henrique Dallazuanna <wwwhsd at gmail.com>wrote:
>
>
>> Try this:
>>
>> my.df$my.newvar <- quantile(my.df$my.var, probs = seq(0.01,1, 0.01))
>>
>>
>> On Sat, Sep 27, 2008 at 3:50 AM, Donald Braman <dbraman at law.gwu.edu>
>> wrote:
>>
>>> I'm wondering if there is a simple way to assign a quantile to a vector
>>>
>> in a
>>
>>> data frame, much like one could in Stata using centile. Let's say I want
>>>
>> 100
>>
>>> slices in my assignation. I can easily see what the limits of each slice
>>>
>> by
>>
>>> using quantile:
>>> quantile(my.df$my.var, probs=seq(0, 1, 0.01))
>>>
>>> But how do I assign the appropriate value to each row/record in my data
>>> frame? Clearly the following won't work, but what will?
>>>
>>> my.df$my.new.var <- quantile(my.df$my.var, probs=seq(0, 1, 0.01))
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>>
>> http://www.R-project.org/posting-guide.html
>>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>> --
>> Henrique Dallazuanna
>> Curitiba-Paraná-Brasil
>> 25° 25' 40" S 49° 16' 22" O
>>
>>
>
> [[alternative HTML version deleted]]
>
>
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list