[R] cut2 once, bin twice...
Gabor Grothendieck
ggrothendieck at gmail.com
Fri Oct 23 17:14:25 CEST 2009
On Fri, Oct 23, 2009 at 3:58 AM, Dieter Menne
<dieter.menne at menne-biomed.de> wrote:
>
>
>
> sdanzige wrote:
>>
>>
>> I'm using the Hmisc cut2 function to bin a set of data. It produces bins
>> that I like with results like this:
>>
>> [96,270]:171
>> [69, 96): 54
>> [49, 69): 40
>> [35, 49): 28
>> [28, 35): 14
>> [24, 28): 8
>> (Other) : 48
>>
>> I would like to take a second set of data, and assign it to bins based on
>> factors defined by my call to cut 2.
>>
>
> It used to be quite tricky, but on popular request Brian Ripley has added an
> example how to extract the intervals using regular expression on the bottom
> of the examples for cut (note:cut in base, not cut2 in Hmisc).
>
> If someone knows of an easier way, please correct me. How about adding this
> information as attribute to the standard cut?
>
The strapply function in gsubfn can do it with a simpler regular
expression since it extracts based on content rather than delimiters,
which is what you want here:
> # create sample data
> library(gsubfn)
> set.seed(1)
> dat <- seq(4, 7, by = 0.05)
> x <- sample(dat, 30)
.
> # use cut
> groups <- cut(x, breaks = 10)
> # extract interval boundaries using strapply
> strapply(levels(groups), "[[:digit:].]+", as.numeric, simplify = TRUE)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 4.0 4.3 4.6 4.9 5.2 5.5 5.8 6.1 6.4 6.7
[2,] 4.3 4.6 4.9 5.2 5.5 5.8 6.1 6.4 6.7 7.0
The above is from
demo("gsubfn-cut")
For more see the gsubfn home page at http://gsubfn.googlecode.com
More information about the R-help
mailing list