[R] move date-values from one line to several lines
Jeff Newmiller
jdnewmil at dcn.davis.ca.us
Tue Dec 2 19:51:55 CET 2014
On Tue, 2 Dec 2014, Marc Schwartz wrote:
>
>> On Dec 2, 2014, at 9:29 AM, Matthias Weber <Matthias.Weber at fntsoftware.com> wrote:
>>
>> Hello together,
>>
>> i have a data.frame with date-values. What I want is a data.frame with a several lines for each date.
>>
>> My current data.frame looks like this one:
>>
>> ID FROM TO REASON
>> 1 2015-02-27 2015-02-28 Holiday
>> 1 2015-03-15 2015-03-20 Illness
>> 2 2015-05-20 2015-02-23 Holiday
>> 2 2015-06-01 2015-06-03 Holiday
>> 2 2015-07-01 2015-07-01 Illness
>>
>> The result looks like this one:
>>
>> ID DATE REASON
>> 1 2015-02-27 Holiday
>> 1 2015-02-28 Holiday
>> 1 2015-03-15 Illness
>> 1 2015-03-16 Illness
>> 1 2015-03-17 Illness
>> 1 2015-03-18 Illness
>> 1 2015-03-19 Illness
>> 1 2015-03-20 Illness
>> 2 2015-05-20 Holiday
>> 2 2015-05-21 Holiday
>> 2 2015-05-22 Holiday
>> 2 2015-05-23 Holiday
>> 2 2015-06-01 Holiday
>> 2 2015-06-02 Holiday
>> 2 2015-06-02 Holiday
>> 2 2015-07-01 Illness
>>
>> Maybe anyone can help me, how I can do this.
>>
>> Thank you.
>>
>> Best regards.
>>
>> Mat
>
>
> A quick and dirty approach.
>
> First, note that in your source data frame, the TO value in the third row is incorrect. I changed it here:
>
>> DF
> ID FROM TO REASON
> 1 1 2015-02-27 2015-02-28 Holiday
> 2 1 2015-03-15 2015-03-20 Illness
> 3 2 2015-05-20 2015-05-23 Holiday
> 4 2 2015-06-01 2015-06-03 Holiday
> 5 2 2015-07-01 2015-07-01 Illness
>
> With that in place, you can use R's recycling of values to create multiple data frame rows from the date sequences and the single ID and REASON entries:
>
> i <- 1
>
>> data.frame(ID = DF$ID[i], DATE = seq(DF$FROM[i], DF$TO[i], by = "day"), REASON = DF$REASON[i])
> ID DATE REASON
> 1 1 2015-02-27 Holiday
> 2 1 2015-02-28 Holiday
>
>
> So just put that into an lapply() based loop, which returns a list:
>
>> DF.TMP <- lapply(seq(nrow(DF)),
> function(i) data.frame(ID = DF$ID[i],
> DATE = seq(DF$FROM[i], DF$TO[i], by = "day"),
> REASON = DF$REASON[i]))
>
>> DF.TMP
> [[1]]
> ID DATE REASON
> 1 1 2015-02-27 Holiday
> 2 1 2015-02-28 Holiday
>
> [[2]]
> ID DATE REASON
> 1 1 2015-03-15 Illness
> 2 1 2015-03-16 Illness
> 3 1 2015-03-17 Illness
> 4 1 2015-03-18 Illness
> 5 1 2015-03-19 Illness
> 6 1 2015-03-20 Illness
>
> [[3]]
> ID DATE REASON
> 1 2 2015-05-20 Holiday
> 2 2 2015-05-21 Holiday
> 3 2 2015-05-22 Holiday
> 4 2 2015-05-23 Holiday
>
> [[4]]
> ID DATE REASON
> 1 2 2015-06-01 Holiday
> 2 2 2015-06-02 Holiday
> 3 2 2015-06-03 Holiday
>
> [[5]]
> ID DATE REASON
> 1 2 2015-07-01 Illness
>
>
> Then use do.call() on the result:
>
>> do.call(rbind, DF.TMP)
> ID DATE REASON
> 1 1 2015-02-27 Holiday
> 2 1 2015-02-28 Holiday
> 3 1 2015-03-15 Illness
> 4 1 2015-03-16 Illness
> 5 1 2015-03-17 Illness
> 6 1 2015-03-18 Illness
> 7 1 2015-03-19 Illness
> 8 1 2015-03-20 Illness
> 9 2 2015-05-20 Holiday
> 10 2 2015-05-21 Holiday
> 11 2 2015-05-22 Holiday
> 12 2 2015-05-23 Holiday
> 13 2 2015-06-01 Holiday
> 14 2 2015-06-02 Holiday
> 15 2 2015-06-03 Holiday
> 16 2 2015-07-01 Illness
>
>
> See ?seq.Date for the critical step.
>
> Regards,
>
> Marc Schwartz
Same thing, with some optional syntactic sugar:
library(dplyr)
dta <- read.table( text=
"ID FROM TO REASON
1 2015-02-27 2015-02-28 Holiday
1 2015-03-15 2015-03-20 Illness
2 2015-05-20 2015-05-23 Holiday
2 2015-06-01 2015-06-03 Holiday
2 2015-07-01 2015-07-01 Illness
", header=TRUE, as.is=TRUE )
# Wrap function sequence in parentheses so pipes can be at beginning
# of line
( dta
# data not provided using dput, so date columns are character
%>% mutate( FROM = as.Date(FROM)
, TO = as.Date(TO)
)
# process data frame one row at a time
%>% rowwise
# form a new data frame using each row, results automatically
# rbind()ed
%>% do( data.frame( ID=.$ID
, DATE=seq.Date( .$FROM, .$TO, by="day" )
, REASON=.$REASON
, stringsAsFactors=FALSE
)
)
# optionally drop "data frame features" provided by dplyr to get
# comparable result as above
%>% as.data.frame
)
Read the dplyr and magrittr package help files to learn more about this
method of handling data. I think Marc's solution is worth understanding
because that is really what dplyr is doing for you, but it can get tedious
to do that whole process yourself day-in and day-out.
Dplyr can also be used in conjunction with data.tables package or SQL,
which can be good if you have a lot of data to work with... again, just
syntactic sugar, but convenient.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
More information about the R-help
mailing list