[R] Convert character string to top levels + NAN

David Winsemius dwinsemius at comcast.net
Thu Apr 22 15:21:07 CEST 2010


On Apr 22, 2010, at 5:16 AM, Michael Haenlein wrote:

> Dear all,
>
> I have several character strings with a high number of different  
> levels.
> unique(x) gives me values in the range of 100-200.
> This creates problems as I would like to use them as predictors in a  
> coxph
> model.
>
> I therefore would like to convert each of these strings to a new  
> string
> (x_new).
> x_new should be equal to x for the top n categories (i.e. the top n  
> levels
> with the highest occurrence) and NAN elsewhere.
> For example, for n=3 x_new would have three levels: The three most  
> common
> levels of x + NAN.
>
> Is there some convenient way of doing this?

  x <- sample(c("top", "three", "levels", "0ther", "strings"), 30,
                  replace=TRUE, prob=c(.3,.3,.3,.1,.1))
  y <- c("top", "three", "levels")
  xnew <- x
  xnew[ !xnew %in% y ] <- "NAN"  # not same as NaN
  table(xnew)

#--------
xnew
levels    NAN  three    top
      5      5      9     11

-- 
David.

>
> Thanks in advance,
>
> Michael
>
>
> Michael Haenlein
> Associate Professor of Marketing
> ESCP Europe
> Paris, France
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list