[R] New var
David L Carlson
dcarlson at tamu.edu
Sun Jun 4 19:36:01 CEST 2017
Since the number of choices is small (6), how about this?
Starting with Jeff's initial DFM:
DFM <- structure(list(obs = 1:6, start = structure(c(16467, 14710, 13152,
13787, 15126, 12696), class = "Date"), end = structure(c(17167,
14975, 13636, 13879, 15340, 12753), class = "Date"), D = c(700,
265, 484, 92, 214, 57), bin = structure(c(6L, 3L, 5L, 1L, 3L,
1L), .Label = c("[0,100)", "[100,200)", "[200,300)", "[300,400)",
"[400,500)", "[500,Inf)"), class = c("ordered", "factor"))), .Names = c("obs",
"start", "end", "D", "bin"), row.names = c(NA, -6L), class = "data.frame")
Construct a matrix of the six alternatives:
tvals <- c(1, -1, -1, -1, -1, 0, 1, -1, -1, -1, 0, 0, 1, -1, -1, 0, 0,
0, 1, -1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0)
tmat <- matrix(tvals, 6, 5, byrow=TRUE)
colnames(tmat) <- paste0("t", 1:5)
tmat
# t1 t2 t3 t4 t5
# [1,] 1 -1 -1 -1 -1
# [2,] 0 1 -1 -1 -1
# [3,] 0 0 1 -1 -1
# [4,] 0 0 0 1 -1
# [5,] 0 0 0 0 1
# [6,] 0 0 0 0 0
idx <-as.numeric(DFM$bin)
(DFM <- data.frame(DFM, tmat[idx, ]))
# obs start end D bin t1 t2 t3 t4 t5
# 1 1 2015-02-01 2017-01-01 700 [500,Inf) 0 0 0 0 0
# 2 2 2010-04-11 2011-01-01 265 [200,300) 0 0 1 -1 -1
# 3 3 2006-01-04 2007-05-03 484 [400,500) 0 0 0 0 1
# 4 4 2007-10-01 2008-01-01 92 [0,100) 1 -1 -1 -1 -1
# 5 5 2011-06-01 2012-01-01 214 [200,300) 0 0 1 -1 -1
# 6 6 2004-10-05 2004-12-01 57 [0,100) 1 -1 -1 -1 -1
David L. Carlson
Department of Anthropology
Texas A&M University
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Val
Sent: Sunday, June 4, 2017 11:31 AM
To: Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
Cc: r-help at R-project.org
Subject: Re: [R] New var
Thank you Jeff and All,
Within a given time period (say 700 days, from the start day), I am
expecting measurements taken at each time interval;. In this case "0" means
measurement taken, "1" not taken (stopped or opted out and " -1" don't
consider that time period for that individual. This will be compared with
the actual measurements taken (Observed- expected) within each time
interval.
On Sat, Jun 3, 2017 at 9:50 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
wrote:
> # read.table is NOT part of the data.table package
> #library(data.table)
> DFM <- read.table( text=
> 'obs start end
> 1 2/1/2015 1/1/2017
> 2 4/11/2010 1/1/2011
> 3 1/4/2006 5/3/2007
> 4 10/1/2007 1/1/2008
> 5 6/1/2011 1/1/2012
> 6 10/5/2004 12/1/2004
> ',header = TRUE, stringsAsFactors = FALSE)
> # cleaner way to compute D
> DFM$start <- as.Date( DFM$start, format="%m/%d/%Y" )
> DFM$end <- as.Date( DFM$end, format="%m/%d/%Y" )
> DFM$D <- as.numeric( DFM$end - DFM$start, units="days" )
> # categorize your data into groups
> DFM$bin <- cut( DFM$D
> , breaks=c( seq( 0, 500, 100 ), Inf )
> , right=FALSE # do not include the right edge
> , ordered_result = TRUE
> )
> # brute force method you should have been able to figure out to show us
> some work
> DFM$t1 <- ifelse( DFM$D < 100, 1, 0 )
> DFM$t2 <- ifelse( 100 <= DFM$D & DFM$D < 200, 1, ifelse( DFM$D < 100, -1,
> 0 ) )
> DFM$t3 <- ifelse( 200 <= DFM$D & DFM$D < 300, 1, ifelse( DFM$D < 200, -1,
> 0 ) )
> DFM$t4 <- ifelse( 300 <= DFM$D & DFM$D < 400, 1, ifelse( DFM$D < 300, -1,
> 0 ) )
> DFM$t5 <- ifelse( 400 <= DFM$D & DFM$D < 500, 1, ifelse( DFM$D < 400, -1,
> 0 ) )
> # brute force method with ordered factor
> DFM$tf1 <- ifelse( "[0,100)" == DFM$bin, 1, 0 )
> DFM$tf2 <- ifelse( "[100,200)" == DFM$bin, 1, ifelse( "[100,200)" <
> DFM$bin, 0, -1 ) )
> DFM$tf3 <- ifelse( "[200,300)" == DFM$bin, 1, ifelse( "[200,300)" <
> DFM$bin, 0, -1 ) )
> DFM$tf4 <- ifelse( "[300,400)" == DFM$bin, 1, ifelse( "[300,400)" <
> DFM$bin, 0, -1 ) )
> DFM$tf5 <- ifelse( "[400,500)" == DFM$bin, 1, ifelse( "[400,500)" <
> DFM$bin, 0, -1 ) )
> # less obvious approach using the fact that factors are integers
> # and using the outer function to find all combinations of elements of two
> vectors
> # and the sign function
> DFM[ , paste0( "tm", 1:5 )] <- outer( as.integer( DFM$bin )
> , 1:5
> , FUN = function(x,y) {
> z <- sign(y-x)+1L
> ifelse( 2 == z, -1L, z )
> }
> )
>
> # my result, provided using dput for precise representation
> DFMresult <- structure(list(obs = 1:6, start = structure(c(16467, 14710,
> 13152, 13787, 15126, 12696), class = "Date"), end = structure(c(17167,
> 14975, 13636, 13879, 15340, 12753), class = "Date"), D = c(700,
> 265, 484, 92, 214, 57), bin = structure(c(6L, 3L, 5L, 1L, 3L,
> 1L), .Label = c("[0,100)", "[100,200)", "[200,300)", "[300,400)",
> "[400,500)", "[500,Inf)"), class = c("ordered", "factor")), t1 = c(0,
> 0, 0, 1, 0, 1), t2 = c(0, 0, 0, -1, 0, -1), t3 = c(0, 1, 0, -1,
> 1, -1), t4 = c(0, -1, 0, -1, -1, -1), t5 = c(0, -1, 1, -1, -1,
> -1), tf1 = c(0, 0, 0, 1, 0, 1), tf2 = c(0, 0, 0, -1, 0, -1),
> tf3 = c(0, 1, 0, -1, 1, -1), tf4 = c(0, -1, 0, -1, -1, -1
> ), tf5 = c(0, -1, 1, -1, -1, -1), tm1 = c(0, 0, 0, 1, 0,
> 1), tm2 = c(0, 0, 0, -1, 0, -1), tm3 = c(0, 1, 0, -1, 1,
> -1), tm4 = c(0, -1, 0, -1, -1, -1), tm5 = c(0, -1, 1, -1,
> -1, -1)), row.names = c(NA, -6L), .Names = c("obs", "start",
> "end", "D", "bin", "t1", "t2", "t3", "t4", "t5", "tf1", "tf2",
> "tf3", "tf4", "tf5", "tm1", "tm2", "tm3", "tm4", "tm5"), class =
> "data.frame")
>
> You did not address Bert's request for some context, but I am curious how
> he or Peter would have approached this problem, so I encourage you do
> provide some insight on the list as to why you are doing this.
>
>
> On Sat, 3 Jun 2017, Val wrote:
>
> Thank you all for the useful suggestion. I did some of my homework.
>>
>> library(data.table)
>> DFM <- read.table(header=TRUE, text='obs start end
>> 1 2/1/2015 1/1/2017
>> 2 4/11/2010 1/1/2011
>> 3 1/4/2006 5/3/2007
>> 4 10/1/2007 1/1/2008
>> 5 6/1/2011 1/1/2012
>> 6 10/5/2004 12/1/2004',stringsAsFactors = FALSE)
>> DFM
>>
>> DFM$D =as.numeric(difftime(as.Date(DFM$end,format="%m/%d/%Y"),
>> as.Date(DFM$start,format="%m/%d/%Y"), units = "days"))
>> DFM
>>
>> output.
>> obs start end D
>> 1 1 2/1/2015 1/1/2017 700
>> 2 2 4/11/2010 1/1/2011 265
>> 3 3 1/4/2006 5/3/2007 484
>> 4 4 10/1/2007 1/1/2008 92
>> 5 5 6/1/2011 1/1/2012 214
>> 6 6 10/5/2004 12/1/2004 57
>>
>> My problem is how do I get the other new variables
>>
>> obs start end D t1,t2,t3,t4, t5
>> 1, 2/1/2015, 1/1/2017, 700,0,0,0,0,0
>> 2, 4/11/2010, 1/1/2011, 265,0,0,1,-1,-1
>> 3, 1/4/2006, 5/3/2007, 484,0,0,0,0,1
>> 4, 10/1/2007, 1/1/2008, 92,1,-1,-1,-1,-1
>> 5, 6/1/2011, 1/1/2012, 214,0,0,1,-1,-1
>> 6, 10/15/2004,12/1/2004,47,1,-1,-1,-1,-1
>>
>> Thank you again.
>>
>>
>>
>> On Sat, Jun 3, 2017 at 12:13 AM, Bert Gunter <bgunter.4567 at gmail.com>
>> wrote:
>>
>>> Ii is difficult to provide useful help, because you have failed to
>>> read and follow the posting guide. In particular:
>>>
>>> 1. Plain text, not HTML.
>>> 2. Use dput() or provide code to create your example. Text printouts
>>> such as that which you gave require some work to wrangle into into an
>>> example that we can test.
>>>
>>> Specifically:
>>>
>>> 3. Have you gone through any R tutorials?-- it sure doesn't look like
>>> it. We do expect some effort to learn R before posting.
>>>
>>> 4. What is the format of your date columns? character, factors,
>>> POSIX,...? See ?date-time for details. Note particularly the
>>> "difftime" link to obtain intervals.
>>>
>>> 5. ?ifelse for vectorized conditionals.
>>>
>>> Also, you might want to explain the context of what you are trying to
>>> do. I strongly suspect you shouldn't be doing it at all, but that is
>>> just a guess.
>>>
>>> Be sure to cc your reply to the list, not just to me.
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Fri, Jun 2, 2017 at 8:49 PM, Val <valkremk at gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have a data set with time interval and depending on the interval I
>>>> want
>>>> to create 5 more variables . Sample data below
>>>>
>>>> obs, Start, End
>>>> 1,2/1/2015, 1/1/2017
>>>> 2,4/11/2010, 1/1/2011
>>>> 3,1/4/2006, 5/3/2007
>>>> 4,10/1/2007, 1/1/2008
>>>> 5,6/1/2011, 1/1/2012
>>>> 6,10/15/2004,12/1/2004
>>>>
>>>> First, I want get interval between the start date and end dates
>>>> (End-start).
>>>>
>>>> obs, Start , end, datediff
>>>> 1,2/1/2015, 1/1/2017, 700
>>>> 2,4/11/2010, 1/1/2011, 265
>>>> 3,1/4/2006, 5/3/2007, 484
>>>> 4,10/1/2007, 1/1/2008, 92
>>>> 5,6/1/2011, 1/1/2012, 214
>>>> 6,10/15/2004,12/1/2004,47
>>>>
>>>> Second. I want create 5 more variables t1, t2, t3, t4 and t5
>>>> The value of each variable is defined as follows
>>>> if datediff < 100 then t1=1, t2=t3=t4=t5=-1.
>>>> if datediff >= 100 and < 200 then t1=0, t2=1,t3=t4=t5=-1,
>>>> if datediff >= 200 and < 300 then t1=0, t2=0,t3=1,t4=t5=-1,
>>>> if datediff >= 300 and < 400 then t1=0, t2=0,t3=0,t4=1,t5=-1,
>>>> if datediff >= 400 and < 500 then t1=0, t2=0,t3=0,t4=0,t5=1,
>>>> if datediff >= 500 then t1=0, t2=0,t3=0,t4=0,t5=0
>>>>
>>>> The complete out put looks like as follow.
>>>> obs, start, end, datediff, t1, t2, t3, t4, t5
>>>> 1, 2/1/2015, 1/1/2017, 700, 0, 0, 0, 0, 0
>>>> 2, 4/11/2010, 1/1/2011, 265, 0, 0, 1, -1, -1
>>>> 3, 1/4/2006, 5/3/2007, 484, 0, 0, 0, 0, 1
>>>> 4, 10/1/2007, 1/1/2008, 92, 1, -1, -1,-1, -1
>>>> 5 , 6/1/2011, 1/1/2012, 214, 0, 0, 1,-1, -1
>>>> 6, 10/15/2004, 12/1/2004, 47, 1, -1, -1, -1, -1
>>>>
>>>> Thank you.
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>> ng-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> ------------------------------------------------------------
> ---------------
> Jeff Newmiller The ..... ..... Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
> Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> ------------------------------------------------------------
> ---------------
>
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list