[R] Create a categorical variable from numeric column
Bert Gunter
gunter.berton at gene.com
Sun Oct 6 17:54:19 CEST 2013
I think this is unwise. It depends on there being exactly 2 categories
in the desired result and silent coercion from logical to numeric, and
so does not generalize. Sometimes brevity is **not** the soul of wit
(google if necessary).
I would suggest instead that cut specify three intervals and the final
condensation to 2 catgories be explicit. This can be done in many
ways, but, ifelse() is convenient here; e.g.
> x <- sample(1:24,10)
> x
[1] 10 2 13 1 23 22 3 18 20 4
> y <- cut(x,bre=c(0,7,18,24),lab=FALSE)
## Note that the "include.lowest" and "right" arguments of cut() can
be invoked to handle endpoints as desired
> y
[1] 2 1 2 1 3 3 1 2 3 1
> factor(ifelse(y ==2,2,1))
[1] 2 1 2 1 1 1 1 2 1 1
Levels: 1 2
## This could all be condensed into a one-liner of course, but at the
cost of clarity.
Cheers,
Bert
On Sun, Oct 6, 2013 at 7:47 AM, arun <smartpink111 at yahoo.com> wrote:
>
> Thanks, ?cut() could be used in one line.
> Categ2<-(!is.na(cut(dat1[,1],breaks=c(7,17))))+1
>
> identical(Categ,Categ2)
> #[1] TRUE
> A.K.
>
>
>
> ----- Original Message -----
> From: Bert Gunter <gunter.berton at gene.com>
> To: arun <smartpink111 at yahoo.com>
> Cc: R help <r-help at r-project.org>
> Sent: Sunday, October 6, 2013 10:18 AM
> Subject: Re: [R] Create a categorical variable from numeric column
>
> No.
>
> Use ?cut instead.
>
> -- Bert
>
>
> On Sun, Oct 6, 2013 at 6:29 AM, arun <smartpink111 at yahoo.com> wrote:
>>
>>
>>
>> Hi,
>>
>> I created 3 categories. If 1-7 and 18-24 should come under the same category, then:
>> Categ<- findInterval(dat1$Col1,c(8,18))+1
>> Categ[Categ>2]<- 1
>> dat1$Categ<- Categ
>> tail(dat1)
>> # Col1 Col2 Categ
>> #45 2 -0.5419758 1
>> #46 21 1.1042719 1
>> #47 24 -1.0787079 1
>> #48 18 0.6253085 1
>> #49 15 -1.6822411 2
>> #50 16 -0.5966446 2
>>
>> A.K.
>>
>>
>>
>>
>>
>> ----- Original Message -----
>> From: arun <smartpink111 at yahoo.com>
>> To: R help <r-help at r-project.org>
>> Cc:
>> Sent: Saturday, October 5, 2013 8:30 PM
>> Subject: Re: Create a categorical variable from numeric column
>>
>> Hi,
>> Try:
>> set.seed(29)
>> dat1<- data.frame(Col1=sample(1:24,50,replace=TRUE),Col2=rnorm(50))
>> dat1$Categ <- findInterval(dat1$Col1,c(8,18))+1
>> head(dat1)
>> # Col1 Col2 Categ
>> #1 3 -0.09381378 1
>> #2 6 -0.83640257 1
>> #3 3 0.00307641 1
>> #4 8 0.04197496 2
>> #5 15 0.15433872 2
>> #6 3 -0.21301893 1
>>
>> split(dat1,dat1$Categ)
>>
>>
>> A.K.
>>
>>
>> I have a data frame that contains a numerical variable ranging from 1 to 24. I would like to create a new category with two ranges: 1 to 7
>> and 18 to 24 will form one category and 8 to 17 will form another. How
>> can I create this category?
>>
>> Thanks
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> (650) 467-7374
>
--
Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374
More information about the R-help
mailing list