[R] How do I use the cut function to assign specific cut points?

Frank Harrell f.harrell at vanderbilt.edu
Thu Jan 26 15:02:12 CET 2012


It is not valid to categorize BMI.  This will result in major loss of
information and residual confounding.  Plus there is huge heterogeneity in
the BMI >= 30 group.   Details are at
http://biostat.mc.vanderbilt.edu/CatContinuous and see these articles:

@Article{fil07cat,
  author = 		 {Filardo, Giovanni and Hamilton, Cody and Hamman, Baron and
Ng, Hon K. T. and Grayburn, Paul},
  title = 		 {Categorizing {BMI} may lead to biased results in studies
investigating in-hospital mortality after isolated {CABG}},
  journal = 	 J Clin Epi,
  year = 		 2007,
  volume = 	 60,
  pages = 	 {1132-1139},
  annote = 	 {BMI;CABG;surgical adverse events;hospital
mortality;epidemiology;smoothing methods;categorization;categorizing
continuous variables;investigators should waive categorization entirely and
use smoothed functions for continuous variables;examples of non-monotonic
relationships}
}
@Article{roy06dic,
  author = 		 {Royston, Patrick and Altman, Douglas G. and
Sauerbrei, Willi},
  title = 		 {Dichotomizing continuous predictors in multiple
regression: a bad idea},
  journal = 	 Stat in Med,
  year = 		 2006,
  volume =		 25,
  pages =		 {127-141},
  annote =		 {continuous
covariates;dichotomization;categorization;regression;efficiency;clinical
research;residual confounding;destruction of statistical inference
when cutpoints are chosen using the response variable;varying effect
estimates when change cutpoints;difficult to interpret effects
when dichotomize;nice plot showing effect of categorization;PBC data}
}

If you work with colleagues who tell you "this is the way it's done"  don't
go down without a fight.  In general, good statistical practice dictates
that categorization is only done for producing certain tables (for which
case you might use the cut2 function in the Hmisc package).  Even that will
change as we incorporate more micrographics (think of loess plots with BMI
on the x-axis) within table cells as is now done in the Hmisc
summary.formula function for purely categorical variables.

Frank


citadel wrote
> 
> I am new to R, and I am trying to cut a continuous variable BMI into
> different categories and can't figure out how to use it. I would like to
> cut it into four groups: <20, 20-25, 25-30 and >= 30.  I am having
> difficulty figuring the code for <20 and >=30? Please help. Thank you.
> 

-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/How-do-I-use-the-cut-function-to-assign-specific-cut-points-tp4329788p4330380.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list