[R] Binning continuous data
David Winsemius
dwinsemius at comcast.net
Thu Mar 1 01:56:04 CET 2012
On Feb 29, 2012, at 5:01 PM, Faryabi, Robert (NIH/NCI) [F] wrote:
> Hi there,
>
> Here is the scenario:
>
> I have a measurement of some sort for two variables, I would like to
> figure out a rough pattern between them. Let say if the values of
> the first variable are low, middle, high, and extremely high, then
> what would be the corresponding pattern of the second variable. The
> idea is not to find the 2d distribution, but plot a conditional
> distribution of the second variable based on the binning of the the
> first variable and then present it in a boxplot.
>
> I got the breakpoints for binning the first variables by a bi-modal
> density estimation. Now I need to bin the first variable accordingly
> and map them to a categorical value.
>
> Is there an R command that does the binning?
It sounds as though you want `cut` and `table`. Whether that is the
best use of the data is more questionable. Generally the
categorization process removes quite a bit of the information content
and may either introduce significant biases or lower power when the
cuts are chosen after looking at the data or lower power when any
inferential test is used. You _should_ also look at 2d density
estimation as a method that is less susceptible to these distortions.
help( kde2d, package=MASS)
help( bkde2D , package=KernSmooth)
help( s.kde2d , package=ade4)
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list