[R] allocating factor levels
Eric Lecoutre
ericlecoutre at gmail.com
Tue Mar 8 09:59:16 CET 2011
Here is a version that should work for any number of values for Start.action
The only requirement is that your data frame is sorted correctly, ie
that subgroups are well defined.
Quite longer but I used it as an exercice to try an approch 'think generic"
I guess there are a lot of better ways...
Kind regards,
Eric
x<- data.frame(Start.action = c(rep('Start.setting', 3),
rep('Start.hauling', 4),
rep('Start.setting', 4),
rep('Start.hauling', 6),
rep('Start.setting', 4),
rep('Start.hauling', 4)))
switch=(as.character(x$Start.action)==c(as.character(x$Start.action[-1]),""))
switch <- !c(FALSE,switch)[1:length(switch)]
cbind(x,switch)
spos=which(switch) # find position of first element of each group
ind=cbind(spos,c(spos[-1]-1,nrow(x))) # build indices start-end of groups
e=lapply(as.data.frame(t(ind)),FUN=function(a)seq(a[1],a[2])) # build
whole indices (fill gaps using seq)
for (i in 1:length(e)){e[[i]] <- data.frame(ind=e[[i]],gr=names(e)[[i]])}
e=do.call("rbind",e) ## prepare a column with unique groups names
x <- cbind(x,gr1=e[,"gr"]) # add this colum to df
gr2pos=table(x$Start.action,x$gr1) # associate with high levels groups
names (Start.action)
a=apply(gr2pos,2,FUN=function(vec) which(vec!=0)) # use associations
levels(x$gr)<-make.unique(rownames(gr2pos)[a]) # assign new names
print(x)
On 08/03/2011, Dennis Murphy <djmuser at gmail.com> wrote:
> Hi:
>
> Here's one way to piece it together. All we need is the first variable, so
> I'll manufacture a vector of Start.action's and go from there.
>
> w <- data.frame(Start.action = c(rep('Start.setting', 3),
> rep('Start.hauling', 4),
> rep('Start.setting', 4),
> rep('Start.hauling', 6),
> rep('Start.setting', 4),
> rep('Start.hauling', 4)))
> wr <- rle(w$Start.action == 'Start.setting')
>> wr
> Run Length Encoding
> lengths: int [1:6] 3 4 4 6 4 4
> values : logi [1:6] TRUE FALSE TRUE FALSE TRUE FALSE
>
> w$cycle <- rep(cumsum(wr$values), wr$lengths)
> w$act <- ifelse(w$Start.action == 'Start.setting', 'set', 'haul')
> w$action <- with(w, paste(act, cycle, sep = ''))
> w$cycle <- w$act <- NULL
>> w
> Start.action action
> 1 Start.setting set1
> 2 Start.setting set1
> 3 Start.setting set1
> 4 Start.hauling haul1
> 5 Start.hauling haul1
> <snip>
> 20 Start.setting set3
> 21 Start.setting set3
> 22 Start.hauling haul3
> 23 Start.hauling haul3
> 24 Start.hauling haul3
> 25 Start.hauling haul3
>
> The rle() function is the key to this; given a logical statement as its
> argument, it is TRUE for Start.setting and FALSE for Start.hauling. The
> cumsum() function on the $values component of the result from rle() gives
> the values we want, and we replicate them according to the vector of
> $lengths given from rle. Once that is done, we just use a vectorized
> ifelse() function to yield 'set' or 'haul' in a new variable and then piece
> that together with the numeric vector...and we're done. Run the code one
> line at a time to understand what each instruction is doing.
>
> HTH,
> Dennis
>
> On Mon, Mar 7, 2011 at 7:13 PM, Darcy Webber <darcy.webber at gmail.com> wrote:
>
>> Dear R users,
>>
>> I am working on allocating the rows within a dataframe into some
>> factor levels.Consider the following dataframe:
>>
>> Start.action Start.time
>> 1 Start.setting 2010-12-30 17:58:00
>> 2 Start.setting 2010-12-30 18:40:00
>> 3 Start.setting 2010-12-31 22:39:00
>> 4 Start.setting 2010-12-31 23:24:00
>> 5 Start.setting 2011-01-01 00:30:00
>> 6 Start.setting 2011-01-01 01:10:00
>> 7 Start.hauling 2011-01-01 07:07:00
>> 8 Start.hauling 2011-01-01 14:25:00
>> 9 Start.hauling 2011-01-01 21:28:00
>> 10 Start.hauling 2011-01-02 03:38:00
>> 11 Start.hauling 2011-01-02 09:28:00
>> 12 Start.hauling 2011-01-02 14:22:00
>> 13 Start.setting 2011-01-02 20:51:00
>> 14 Start.setting 2011-01-02 21:33:00
>> 15 Start.setting 2011-01-02 22:47:00
>> 16 Start.setting 2011-01-02 23:27:00
>> 17 Start.setting 2011-01-03 00:35:00
>> 18 Start.setting 2011-01-03 01:16:00
>> 19 Start.hauling 2011-01-03 04:31:00
>> 20 Start.hauling 2011-01-03 08:57:00
>>
>> I am trying to assign a factor level like the one below (named
>> "action") according to the sequence of setting and hauling occuring in
>> the "Start.action" column. In fact, it wouldnt even need to be a
>> factor or character, it could simply be numbered (i.e., the set/haul
>> prefix is useless as I could simply split it afterwards).
>>
>> Start.action Start.time action
>> 1 Start.setting 2010-12-30 17:58:00 set1
>> 2 Start.setting 2010-12-30 18:40:00 set1
>> 3 Start.setting 2010-12-31 22:39:00 set1
>> 4 Start.setting 2010-12-31 23:24:00 set1
>> 5 Start.setting 2011-01-01 00:30:00 set1
>> 6 Start.setting 2011-01-01 01:10:00 set1
>> 7 Start.hauling 2011-01-01 07:07:00 haul1
>> 8 Start.hauling 2011-01-01 14:25:00 haul1
>> 9 Start.hauling 2011-01-01 21:28:00 haul1
>> 10 Start.hauling 2011-01-02 03:38:00 haul1
>> 11 Start.hauling 2011-01-02 09:28:00 haul1
>> 12 Start.hauling 2011-01-02 14:22:00 haul1
>> 13 Start.setting 2011-01-02 20:51:00 set2
>> 14 Start.setting 2011-01-02 21:33:00 set2
>> 15 Start.setting 2011-01-02 22:47:00 set2
>> 16 Start.setting 2011-01-02 23:27:00 set2
>> 17 Start.setting 2011-01-03 00:35:00 set2
>> 18 Start.setting 2011-01-03 01:16:00 set2
>> 19 Start.hauling 2011-01-03 04:31:00 haul2
>> 20 Start.hauling 2011-01-03 08:57:00 haul2
>>
>> It seems like such a simple question, yet I just cant think of how to
>> implement this. Any hints or ideas on how I might achieve this would
>> be much appreciated.
>>
>> Regards,
>> Darcy
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Eric Lecoutre
Consultant - Business & Decision
Business Intelligence & Customer Intelligence
More information about the R-help
mailing list