[R] how to categorize continuous variable when useing regression

Frank E Harrell Jr f.harrell at vanderbilt.edu
Thu Sep 17 04:26:24 CEST 2009


Manli Yan wrote:
>       assume dependent variable y( continuous),independent variable x (
> continuous),I try to  categorize x with some interval,such that,those
> intervals would has most significant different effect on y.
>    any one knows which method I should apply,I know it will cause the loss
> of information,but can I really do that?or by using what mehod ,I will keep
> the loss minimal,all I want just some key words,thanks in advance~

This is bad statistical practice and should be avoided.  Use modern 
methods such as regression splines, penalized splines, loess, etc.

Howard Wainer provided an algorithm that, for any set of x-y pairs in 
which there is no correlation, one can find a set of 5 intervals such 
that the mean y is increasing in x and another set of intervals in which 
the mean y is decreasing in x.

Frank

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list