[R] how to categorize continuous variable when useing regression
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Thu Sep 17 04:26:24 CEST 2009
Manli Yan wrote:
> assume dependent variable y( continuous),independent variable x (
> continuous),I try to categorize x with some interval,such that,those
> intervals would has most significant different effect on y.
> any one knows which method I should apply,I know it will cause the loss
> of information,but can I really do that?or by using what mehod ,I will keep
> the loss minimal,all I want just some key words,thanks in advance~
This is bad statistical practice and should be avoided. Use modern
methods such as regression splines, penalized splines, loess, etc.
Howard Wainer provided an algorithm that, for any set of x-y pairs in
which there is no correlation, one can find a set of 5 intervals such
that the mean y is increasing in x and another set of intervals in which
the mean y is decreasing in x.
Frank
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list