[R] Creating a new by variable in a dataframe

arun smartpink111 at yahoo.com
Sat Oct 20 04:49:26 CEST 2012



HI,
Without using "ifelse()" on the same example dataset.
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))

d$date <- as.Date(d$date,format="%Y-%m-%d")
d$time<-strptime(d$time,format="%H:%M")$hour
d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
d$datetime<-as.POSIXct(paste(d$date,d$time," "),format="%Y-%m-%d %H")
d1<-d[,c(1,5,4)]
 d1
#   transaction            datetime  flag
#1          T01 2012-10-19 08:00:00 FALSE
#2          T02 2012-10-19 09:00:00 FALSE
#3          T03 2012-10-19 10:00:00 FALSE
#4          T04 2012-10-19 11:00:00  TRUE
#5          T05 2012-10-22 12:00:00  TRUE
#6          T06 2012-10-23 13:00:00 FALSE
#7          T07 2012-10-23 14:00:00 FALSE
#8          T08 2012-10-23 15:00:00 FALSE
#9          T09 2012-10-23 16:00:00 FALSE
#10         T10 2012-10-23 17:00:00  TRUE

str(d1)
#'data.frame':    10 obs. of  3 variables:
# $ transaction: chr  "T01" "T02" "T03" "T04" ...
# $ datetime   : POSIXct, format: "2012-10-19 08:00:00" "2012-10-19 09:00:00" ...
# $ flag       : logi  FALSE FALSE FALSE TRUE TRUE FALSE ...

A.K.


----- Original Message -----
From: Flavio Barros <flaviomargarito at gmail.com>
To: William Dunlap <wdunlap at tibco.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>; ramoss <ramine.mossadegh at finra.org>
Sent: Friday, October 19, 2012 4:24 PM
Subject: Re: [R] Creating a new by variable in a dataframe

I think i have a better solution

*## Example data.frame*
d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02",
"T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date =
c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22",
"2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time
= c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00",
"16:00", "17:00"))

*## As date tranfomation*
d$date <- as.Date(d$date)
d$time <- strptime(d$time, format='%H')

library(reshape)

*## Create factor to split the data*
fdate <- factor(format(d$date, '%D'))

*## Create a list with logical TRUE when is the last transaction*
ex <- sapply(split(d, fdate), function(x)
ifelse(as.numeric(x[,'time'])==max(as.numeric(x[,'time'])),T,F))

*## Coerce to logical vector*
flag <- unlist(rbind(ex))

*## With reshape we have the transform function e can add the flag column *
d <- transform(d, flag = flag)

On Fri, Oct 19, 2012 at 3:51 PM, William Dunlap <wdunlap at tibco.com> wrote:

> Suppose your data frame is
> d <- data.frame(
>      stringsAsFactors = FALSE,
>      transaction = c("T01", "T02", "T03", "T04", "T05", "T06",
>         "T07", "T08", "T09", "T10"),
>      date = c("2012-10-19", "2012-10-19", "2012-10-19",
>         "2012-10-19", "2012-10-22", "2012-10-23",
>         "2012-10-23", "2012-10-23", "2012-10-23",
>         "2012-10-23"),
>      time = c("08:00", "09:00", "10:00", "11:00", "12:00",
>         "13:00", "14:00", "15:00", "16:00", "17:00"
>         ))
> (Convert the date and time to your favorite classes, it doesn't matter
> here.)
>
> A general way to say if an item is the last of its group is:
>   isLastInGroup <- function(...)  ave(logical(length(..1)), ...,
> FUN=function(x)seq_along(x)==length(x))
>   is_last_of_dayA <- with(d, isLastInGroup(date))
> If you know your data is sorted by date you could save a little time for
> large
> datasets by using
>   isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
>   is_last_of_dayB <- isLastInRun(d$date)
> The above d is sorted by date so you get the same results for both:
>   > cbind(d, is_last_of_dayA, is_last_of_dayB)
>      transaction       date  time is_last_of_dayA is_last_of_dayB
>   1          T01 2012-10-19 08:00           FALSE           FALSE
>   2          T02 2012-10-19 09:00           FALSE           FALSE
>   3          T03 2012-10-19 10:00           FALSE           FALSE
>   4          T04 2012-10-19 11:00            TRUE            TRUE
>   5          T05 2012-10-22 12:00            TRUE            TRUE
>   6          T06 2012-10-23 13:00           FALSE           FALSE
>   7          T07 2012-10-23 14:00           FALSE           FALSE
>   8          T08 2012-10-23 15:00           FALSE           FALSE
>   9          T09 2012-10-23 16:00           FALSE           FALSE
>   10         T10 2012-10-23 17:00            TRUE            TRUE
>
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf
> > Of ramoss
> > Sent: Friday, October 19, 2012 10:52 AM
> > To: r-help at r-project.org
> > Subject: [R] Creating a new by variable in a dataframe
> >
> > Hello,
> >
> > I have a dataframe w/ 3 variables of interest: transaction,date(tdate) &
> > time(event_tim).
> > How could I create a 4th variable (last_trans) that would flag the last
> > transaction of the day for each day?
> > In SAS I use:
> > proc sort data=all6;
> > by tdate event_tim;
> > run;
> >          /*Create last transaction flag per day*/
> > data all6;
> >   set all6;
> >   by tdate event_tim;
> >   last_trans=last.tdate;
> >
> > Thanks ahead for any suggestions.
> >
> >
> >
> > --
> > View this message in context:
> http://r.789695.n4.nabble.com/Creating-a-new-by-
> > variable-in-a-dataframe-tp4646782.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Att,

Flávio Barros

    [[alternative HTML version deleted]]


______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





More information about the R-help mailing list