[R] Binning continuous data

David Winsemius dwinsemius at comcast.net
Thu Mar 1 01:56:04 CET 2012


On Feb 29, 2012, at 5:01 PM, Faryabi, Robert (NIH/NCI) [F] wrote:

> Hi there,
>
> Here is the scenario:
>
> I have a measurement of some sort for two variables, I would like to  
> figure out a rough pattern between them. Let say if the values of  
> the first variable are low, middle, high, and extremely high, then  
> what would be the corresponding pattern of the second variable. The  
> idea is not to find the 2d distribution, but plot a conditional  
> distribution of the second variable based on the binning of the the  
> first variable and then present it in a boxplot.
>
> I got the breakpoints for binning the first variables by a bi-modal  
> density estimation. Now I need to bin the first variable accordingly  
> and map them to a categorical value.
>
> Is there an R command that does the binning?

It sounds as though you want `cut` and `table`. Whether that is the  
best use of the data is more questionable. Generally the  
categorization process removes quite a bit of the information content  
and may either introduce significant biases or lower power  when the  
cuts are chosen after looking at the data or lower power when any  
inferential test is used. You _should_ also look at 2d density  
estimation as a method that is less susceptible to these distortions.

help( kde2d, package=MASS)

help( bkde2D , package=KernSmooth)

help( s.kde2d , package=ade4)

-- 
David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list