[R] allocating factor levels

Eric Lecoutre ericlecoutre at gmail.com
Tue Mar 8 09:59:16 CET 2011


Here is a version that should work for any number of values for Start.action
The only requirement is that your data frame is sorted correctly, ie
that subgroups are well defined.
Quite longer but I used it as an exercice to try an approch 'think generic"
I guess there are a lot of better ways...

Kind regards,

Eric


x<- data.frame(Start.action = c(rep('Start.setting', 3),
				rep('Start.hauling', 4),
				rep('Start.setting', 4),
				rep('Start.hauling', 6),
				rep('Start.setting', 4),
				rep('Start.hauling', 4)))
switch=(as.character(x$Start.action)==c(as.character(x$Start.action[-1]),""))
switch <- !c(FALSE,switch)[1:length(switch)]
cbind(x,switch)
spos=which(switch) # find position of first element of each group
ind=cbind(spos,c(spos[-1]-1,nrow(x))) # build indices start-end of groups
e=lapply(as.data.frame(t(ind)),FUN=function(a)seq(a[1],a[2])) # build
whole indices (fill gaps using seq)
for (i in 1:length(e)){e[[i]] <- data.frame(ind=e[[i]],gr=names(e)[[i]])}
e=do.call("rbind",e) ## prepare a column with unique groups names
x <- cbind(x,gr1=e[,"gr"]) # add this colum to df
gr2pos=table(x$Start.action,x$gr1) # associate with high levels groups
names (Start.action)
a=apply(gr2pos,2,FUN=function(vec) which(vec!=0)) # use associations
levels(x$gr)<-make.unique(rownames(gr2pos)[a]) # assign new names
print(x)



On 08/03/2011, Dennis Murphy <djmuser at gmail.com> wrote:
> Hi:
>
> Here's one way to piece it together. All we need is the first variable, so
> I'll manufacture a vector of Start.action's and go from there.
>
> w <- data.frame(Start.action = c(rep('Start.setting', 3),
> rep('Start.hauling', 4),
>                                  rep('Start.setting', 4),
> rep('Start.hauling', 6),
>                                  rep('Start.setting', 4),
> rep('Start.hauling', 4)))
> wr <- rle(w$Start.action == 'Start.setting')
>> wr
> Run Length Encoding
>   lengths: int [1:6] 3 4 4 6 4 4
>   values : logi [1:6] TRUE FALSE TRUE FALSE TRUE FALSE
>
> w$cycle <- rep(cumsum(wr$values), wr$lengths)
> w$act <- ifelse(w$Start.action == 'Start.setting', 'set', 'haul')
> w$action <- with(w, paste(act, cycle, sep = ''))
> w$cycle <- w$act <- NULL
>> w
>     Start.action action
> 1  Start.setting   set1
> 2  Start.setting   set1
> 3  Start.setting   set1
> 4  Start.hauling  haul1
> 5  Start.hauling  haul1
> <snip>
> 20 Start.setting   set3
> 21 Start.setting   set3
> 22 Start.hauling  haul3
> 23 Start.hauling  haul3
> 24 Start.hauling  haul3
> 25 Start.hauling  haul3
>
> The rle() function is the key to this; given a logical statement as its
> argument, it is TRUE for Start.setting and FALSE for Start.hauling. The
> cumsum() function on the $values component of the result from rle() gives
> the values we want, and we replicate them according to the vector of
> $lengths given from rle. Once that is done, we just use a vectorized
> ifelse() function to yield 'set' or 'haul' in a new variable and then piece
> that together with the numeric vector...and we're done. Run the code one
> line at a time to understand what each instruction is doing.
>
> HTH,
> Dennis
>
> On Mon, Mar 7, 2011 at 7:13 PM, Darcy Webber <darcy.webber at gmail.com> wrote:
>
>> Dear R users,
>>
>> I am working on allocating the rows within a dataframe into some
>> factor levels.Consider the following dataframe:
>>
>>               Start.action                  Start.time
>> 1            Start.setting    2010-12-30 17:58:00
>> 2            Start.setting    2010-12-30 18:40:00
>> 3            Start.setting    2010-12-31 22:39:00
>> 4            Start.setting    2010-12-31 23:24:00
>> 5            Start.setting    2011-01-01 00:30:00
>> 6            Start.setting    2011-01-01 01:10:00
>> 7            Start.hauling    2011-01-01 07:07:00
>> 8            Start.hauling    2011-01-01 14:25:00
>> 9            Start.hauling    2011-01-01 21:28:00
>> 10          Start.hauling    2011-01-02 03:38:00
>> 11          Start.hauling    2011-01-02 09:28:00
>> 12          Start.hauling    2011-01-02 14:22:00
>> 13          Start.setting    2011-01-02 20:51:00
>> 14          Start.setting    2011-01-02 21:33:00
>> 15          Start.setting    2011-01-02 22:47:00
>> 16          Start.setting    2011-01-02 23:27:00
>> 17          Start.setting    2011-01-03 00:35:00
>> 18          Start.setting    2011-01-03 01:16:00
>> 19          Start.hauling    2011-01-03 04:31:00
>> 20          Start.hauling    2011-01-03 08:57:00
>>
>> I am trying to assign a factor level like the one below (named
>> "action") according to the sequence of setting and hauling occuring in
>> the "Start.action" column. In fact, it wouldnt even need to be a
>> factor or character, it could simply be numbered (i.e., the set/haul
>> prefix is useless as I could simply split it afterwards).
>>
>>              Start.action                   Start.time   action
>> 1            Start.setting    2010-12-30 17:58:00    set1
>> 2            Start.setting    2010-12-30 18:40:00    set1
>> 3            Start.setting    2010-12-31 22:39:00    set1
>> 4            Start.setting    2010-12-31 23:24:00    set1
>> 5            Start.setting    2011-01-01 00:30:00    set1
>> 6            Start.setting    2011-01-01 01:10:00    set1
>> 7            Start.hauling    2011-01-01 07:07:00   haul1
>> 8            Start.hauling    2011-01-01 14:25:00   haul1
>> 9            Start.hauling    2011-01-01 21:28:00   haul1
>> 10          Start.hauling    2011-01-02 03:38:00   haul1
>> 11          Start.hauling    2011-01-02 09:28:00   haul1
>> 12          Start.hauling    2011-01-02 14:22:00   haul1
>> 13          Start.setting    2011-01-02 20:51:00    set2
>> 14          Start.setting    2011-01-02 21:33:00    set2
>> 15          Start.setting    2011-01-02 22:47:00    set2
>> 16          Start.setting    2011-01-02 23:27:00    set2
>> 17          Start.setting    2011-01-03 00:35:00    set2
>> 18          Start.setting    2011-01-03 01:16:00    set2
>> 19          Start.hauling    2011-01-03 04:31:00   haul2
>> 20          Start.hauling    2011-01-03 08:57:00   haul2
>>
>> It seems like such a simple question, yet I just cant think of how to
>> implement this. Any hints or ideas on how I might achieve this would
>> be much appreciated.
>>
>> Regards,
>> Darcy
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Eric Lecoutre
Consultant - Business & Decision
Business Intelligence & Customer Intelligence



More information about the R-help mailing list