[R] Re-grouping data in R

Rui Barradas ruipbarradas at sapo.pt
Wed Aug 8 00:29:45 CEST 2012


Hello,

Inline.

Em 07-08-2012 19:56, Abraham Mathew escreveu:
> I have a data frame with a column of values that I want to bucket (group)
> into specific levels.
>
>> str(dat)'data.frame':	3678 obs. of  39 variables:
>   $ id                          : int  23 76 129 156 166 180 200 214 296 344 ...
>   $ final_purchase_amount       : Factor w/ 32 levels
> "\\N","1082","1109",..: 1 1 1 1 1 1 1 1 1 1 ...
>
>
> So I ran the following to produce new levels, one for values from 100
> to 400, 401 to 1000, and 1001+.
>
>
> dat$final_purchase_amount<- NA
> dat$final_purchase_amount[dat$final_purchase_amount %in%
> levels(dat$final_purchase_amount)[c(8,9,11,12,13,15,16,17,18,19,20,21)]]
> <- "100 to 400"
> dat$final_purchase_amount[dat$final_purchase_amount %in%
> levels(dat$final_purchase_amount)[c(22,23,24,25,26,27,28,29,30,31,32)]]
> <- "401 to 1000"
> dat$final_purchase_amount[dat$final_purchase_amount %in%
> levels(dat$final_purchase_amount)[c(2,3,4,5,6,7,10,14)]] <- "1001 +"
> dat$final_purchase_amount <- factor(dat$final_purchase_amount)
> levels(dat$final_purchase_amount)
> table(dat$final_purchase_amount)
>
>
>
> However, this doesn't seem to produce any levels
Fortunately not! You have started by setting the entire column vector to 
NA in your first instruction above, then try several times to find that 
vector of NAs %in%  levels numbers c(8,9, ...etc...) or c(22,23, ...etc..).
Your first line of code makes everything else relative to 
dat$final_purchase_amount useless. (I believe that that line should be 
deleted.)

Hope this helps,

Rui Barradas

>   and returns the following.
>
>
>> levels(dat$final_purchase_amount)character(0)
>
>
> Can anyone point to what I'm doing wrong.
>
>
>
> Thanks!
>
>



More information about the R-help mailing list