[R] New var

David R Forrest drf at vims.edu
Sun Jun 4 20:14:03 CEST 2017



Sent from my iPhone

> On Jun 4, 2017, at 1:36 PM, David L Carlson <dcarlson at tamu.edu> wrote:
> 
> Since the number of choices is small (6), how about this?
> 
> Starting with Jeff's initial DFM:
> 
> DFM <- structure(list(obs = 1:6, start = structure(c(16467, 14710, 13152, 
> 13787, 15126, 12696), class = "Date"), end = structure(c(17167, 
> 14975, 13636, 13879, 15340, 12753), class = "Date"), D = c(700, 
> 265, 484, 92, 214, 57), bin = structure(c(6L, 3L, 5L, 1L, 3L, 
> 1L), .Label = c("[0,100)", "[100,200)", "[200,300)", "[300,400)", 
> "[400,500)", "[500,Inf)"), class = c("ordered", "factor"))), .Names = c("obs", 
> "start", "end", "D", "bin"), row.names = c(NA, -6L), class = "data.frame")
> 
> Construct a matrix of the six alternatives:
> 
> tvals <- c(1, -1, -1, -1, -1, 0, 1, -1, -1, -1, 0, 0, 1, -1, -1, 0, 0, 
>    0, 1, -1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0)
> tmat <- matrix(tvals, 6, 5, byrow=TRUE)
> colnames(tmat) <- paste0("t", 1:5)
> tmat
> #      t1 t2 t3 t4 t5
> # [1,]  1 -1 -1 -1 -1
> # [2,]  0  1 -1 -1 -1
> # [3,]  0  0  1 -1 -1
> # [4,]  0  0  0  1 -1
> # [5,]  0  0  0  0  1
> # [6,]  0  0  0  0  0
> 
> idx <-as.numeric(DFM$bin)
> (DFM <- data.frame(DFM, tmat[idx, ]))
> #    obs      start        end   D       bin t1 t2 t3 t4 t5
> # 1   1 2015-02-01 2017-01-01 700 [500,Inf)  0  0  0  0  0
> # 2   2 2010-04-11 2011-01-01 265 [200,300)  0  0  1 -1 -1
> # 3   3 2006-01-04 2007-05-03 484 [400,500)  0  0  0  0  1
> # 4   4 2007-10-01 2008-01-01  92   [0,100)  1 -1 -1 -1 -1
> # 5   5 2011-06-01 2012-01-01 214 [200,300)  0  0  1 -1 -1
> # 6   6 2004-10-05 2004-12-01  57   [0,100)  1 -1 -1 -1 -1
> 
> 
> David L. Carlson
> Department of Anthropology
> Texas A&M University
> 
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Val
> Sent: Sunday, June 4, 2017 11:31 AM
> To: Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
> Cc: r-help at R-project.org
> Subject: Re: [R] New var
> 
> Thank you Jeff and All,
> 
> Within a given time period (say 700 days, from the start day),  I am
> expecting measurements taken at each time interval;. In this case "0" means
> measurement taken, "1"  not taken (stopped or opted out  and " -1"  don't
> consider that time period for that individual. This will be compared with
> the actual measurements taken (Observed- expected)  within each time
> interval.
> 
> 
> 
> 
> On Sat, Jun 3, 2017 at 9:50 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
> wrote:
> 
>> # read.table is NOT part of the data.table package
>> #library(data.table)
>> DFM <- read.table( text=
>> 'obs start end
>> 1 2/1/2015   1/1/2017
>> 2 4/11/2010  1/1/2011
>> 3 1/4/2006   5/3/2007
>> 4 10/1/2007  1/1/2008
>> 5 6/1/2011   1/1/2012
>> 6 10/5/2004 12/1/2004
>> ',header = TRUE, stringsAsFactors = FALSE)
>> # cleaner way to compute D
>> DFM$start <- as.Date( DFM$start, format="%m/%d/%Y" )
>> DFM$end <- as.Date( DFM$end, format="%m/%d/%Y" )
>> DFM$D <- as.numeric( DFM$end - DFM$start, units="days" )
>> # categorize your data into groups
>> DFM$bin <- cut( DFM$D
>>              , breaks=c( seq( 0, 500, 100 ), Inf )
>>              , right=FALSE # do not include the right edge
>>              , ordered_result = TRUE
>>              )
>> # brute force method you should have been able to figure out to show us
>> some work
>> DFM$t1 <- ifelse( DFM$D < 100, 1, 0 )
>> DFM$t2 <- ifelse( 100 <= DFM$D & DFM$D < 200, 1, ifelse( DFM$D < 100, -1,
>> 0 ) )
>> DFM$t3 <- ifelse( 200 <= DFM$D & DFM$D < 300, 1, ifelse( DFM$D < 200, -1,
>> 0 ) )
>> DFM$t4 <- ifelse( 300 <= DFM$D & DFM$D < 400, 1, ifelse( DFM$D < 300, -1,
>> 0 ) )
>> DFM$t5 <- ifelse( 400 <= DFM$D & DFM$D < 500, 1, ifelse( DFM$D < 400, -1,
>> 0 ) )
>> # brute force method with ordered factor
>> DFM$tf1 <- ifelse( "[0,100)" == DFM$bin, 1, 0 )
>> DFM$tf2 <- ifelse( "[100,200)" == DFM$bin, 1, ifelse( "[100,200)" <
>> DFM$bin, 0, -1 ) )
>> DFM$tf3 <- ifelse( "[200,300)" == DFM$bin, 1, ifelse( "[200,300)" <
>> DFM$bin, 0, -1 ) )
>> DFM$tf4 <- ifelse( "[300,400)" == DFM$bin, 1, ifelse( "[300,400)" <
>> DFM$bin, 0, -1 ) )
>> DFM$tf5 <- ifelse( "[400,500)" == DFM$bin, 1, ifelse( "[400,500)" <
>> DFM$bin, 0, -1 ) )
>> # less obvious approach using the fact that factors are integers
>> # and using the outer function to find all combinations of elements of two
>> vectors
>> # and the sign function
>> DFM[ , paste0( "tm", 1:5 )] <- outer( as.integer( DFM$bin )
>>                                    , 1:5
>>                                    , FUN = function(x,y) {
>>                                          z <- sign(y-x)+1L
>>                                          ifelse( 2 == z, -1L, z )
>>                                      }
>>                                    )
>> 
>> # my result, provided using dput for precise representation
>> DFMresult <- structure(list(obs = 1:6, start = structure(c(16467, 14710,
>> 13152, 13787, 15126, 12696), class = "Date"), end = structure(c(17167,
>> 14975, 13636, 13879, 15340, 12753), class = "Date"), D = c(700,
>> 265, 484, 92, 214, 57), bin = structure(c(6L, 3L, 5L, 1L, 3L,
>> 1L), .Label = c("[0,100)", "[100,200)", "[200,300)", "[300,400)",
>> "[400,500)", "[500,Inf)"), class = c("ordered", "factor")), t1 = c(0,
>> 0, 0, 1, 0, 1), t2 = c(0, 0, 0, -1, 0, -1), t3 = c(0, 1, 0, -1,
>> 1, -1), t4 = c(0, -1, 0, -1, -1, -1), t5 = c(0, -1, 1, -1, -1,
>> -1), tf1 = c(0, 0, 0, 1, 0, 1), tf2 = c(0, 0, 0, -1, 0, -1),
>>    tf3 = c(0, 1, 0, -1, 1, -1), tf4 = c(0, -1, 0, -1, -1, -1
>>    ), tf5 = c(0, -1, 1, -1, -1, -1), tm1 = c(0, 0, 0, 1, 0,
>>    1), tm2 = c(0, 0, 0, -1, 0, -1), tm3 = c(0, 1, 0, -1, 1,
>>    -1), tm4 = c(0, -1, 0, -1, -1, -1), tm5 = c(0, -1, 1, -1,
>>    -1, -1)), row.names = c(NA, -6L), .Names = c("obs", "start",
>> "end", "D", "bin", "t1", "t2", "t3", "t4", "t5", "tf1", "tf2",
>> "tf3", "tf4", "tf5", "tm1", "tm2", "tm3", "tm4", "tm5"), class =
>> "data.frame")
>> 
>> You did not address Bert's request for some context, but I am curious how
>> he or Peter would have approached this problem, so I encourage you do
>> provide some insight on the list as to why you are doing this.
>> 
>> 
>> On Sat, 3 Jun 2017, Val wrote:
>> 
>> Thank you all for the useful suggestion. I did some of my homework.
>>> 
>>> library(data.table)
>>> DFM <- read.table(header=TRUE, text='obs start end
>>> 1 2/1/2015   1/1/2017
>>> 2 4/11/2010  1/1/2011
>>> 3 1/4/2006   5/3/2007
>>> 4 10/1/2007  1/1/2008
>>> 5 6/1/2011   1/1/2012
>>> 6 10/5/2004 12/1/2004',stringsAsFactors = FALSE)
>>> DFM
>>> 
>>> DFM$D =as.numeric(difftime(as.Date(DFM$end,format="%m/%d/%Y"),
>>> as.Date(DFM$start,format="%m/%d/%Y"), units = "days"))
>>> DFM
>>> 
>>> output.
>>>    obs     start       end   D
>>> 1   1  2/1/2015  1/1/2017 700
>>> 2   2 4/11/2010  1/1/2011 265
>>> 3   3  1/4/2006  5/3/2007 484
>>> 4   4 10/1/2007  1/1/2008  92
>>> 5   5  6/1/2011  1/1/2012 214
>>> 6   6 10/5/2004 12/1/2004  57
>>> 
>>> My problem is how do I get the other new variables
>>> 
>>> obs     start       end   D  t1,t2,t3,t4, t5
>>> 1, 2/1/2015,  1/1/2017, 700,0,0,0,0,0
>>> 2, 4/11/2010, 1/1/2011, 265,0,0,1,-1,-1
>>> 3, 1/4/2006,  5/3/2007, 484,0,0,0,0,1
>>> 4, 10/1/2007, 1/1/2008, 92,1,-1,-1,-1,-1
>>> 5, 6/1/2011,  1/1/2012, 214,0,0,1,-1,-1
>>> 6, 10/15/2004,12/1/2004,47,1,-1,-1,-1,-1
>>> 
>>> Thank you again.
>>> 
>>> 
>>> 
>>> On Sat, Jun 3, 2017 at 12:13 AM, Bert Gunter <bgunter.4567 at gmail.com>
>>> wrote:
>>> 
>>>> Ii is difficult to provide useful help, because you have failed to
>>>> read and follow the posting guide. In particular:
>>>> 
>>>> 1. Plain text, not HTML.
>>>> 2. Use dput() or provide code to create your example. Text printouts
>>>> such as that which you gave require some work to wrangle into into an
>>>> example that we can test.
>>>> 
>>>> Specifically:
>>>> 
>>>> 3. Have you gone through any R tutorials?-- it sure doesn't look like
>>>> it. We do expect some effort to learn R before posting.
>>>> 
>>>> 4. What is the format of your date columns? character, factors,
>>>> POSIX,...? See ?date-time for details. Note particularly the
>>>> "difftime" link to obtain intervals.
>>>> 
>>>> 5. ?ifelse  for vectorized conditionals.
>>>> 
>>>> Also, you might want to explain the context of what you are trying to
>>>> do. I strongly suspect you shouldn't be doing it at all, but that is
>>>> just a guess.
>>>> 
>>>> Be sure to cc your reply to the list, not just to me.
>>>> 
>>>> Cheers,
>>>> Bert
>>>> 
>>>> 
>>>> Bert Gunter
>>>> 
>>>> "The trouble with having an open mind is that people keep coming along
>>>> and sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>> 
>>>> 
>>>>> On Fri, Jun 2, 2017 at 8:49 PM, Val <valkremk at gmail.com> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I have a data set with time interval and depending on the interval I
>>>>> want
>>>>> to create 5 more variables . Sample data below
>>>>> 
>>>>> obs,   Start,   End
>>>>> 1,2/1/2015,  1/1/2017
>>>>> 2,4/11/2010, 1/1/2011
>>>>> 3,1/4/2006,  5/3/2007
>>>>> 4,10/1/2007, 1/1/2008
>>>>> 5,6/1/2011,  1/1/2012
>>>>> 6,10/15/2004,12/1/2004
>>>>> 
>>>>> First, I want get  interval between the start date and end dates
>>>>> (End-start).
>>>>> 
>>>>> obs,  Start , end, datediff
>>>>> 1,2/1/2015,  1/1/2017, 700
>>>>> 2,4/11/2010, 1/1/2011, 265
>>>>> 3,1/4/2006,  5/3/2007, 484
>>>>> 4,10/1/2007, 1/1/2008, 92
>>>>> 5,6/1/2011,  1/1/2012, 214
>>>>> 6,10/15/2004,12/1/2004,47
>>>>> 
>>>>> Second. I want create 5 more variables  t1, t2, t3, t4 and  t5
>>>>> The value of each variable is defined as follows
>>>>> if datediff <   100 then  t1=1,  t2=t3=t4=t5=-1.
>>>>> if datediff >= 100 and  < 200 then  t1=0, t2=1,t3=t4=t5=-1,
>>>>> if datediff >= 200 and  < 300 then  t1=0, t2=0,t3=1,t4=t5=-1,
>>>>> if datediff >= 300 and  < 400 then  t1=0, t2=0,t3=0,t4=1,t5=-1,
>>>>> if datediff >= 400 and  < 500 then  t1=0, t2=0,t3=0,t4=0,t5=1,
>>>>> if datediff >= 500 then  t1=0, t2=0,t3=0,t4=0,t5=0
>>>>> 
>>>>> The complete out put looks like as follow.
>>>>> obs, start,         end,    datediff,   t1, t2, t3, t4, t5
>>>>> 1,    2/1/2015,   1/1/2017,    700, 0,  0,  0,  0,  0
>>>>> 2,  4/11/2010,   1/1/2011,    265, 0,  0,  1, -1,  -1
>>>>> 3,    1/4/2006,   5/3/2007,    484, 0,  0,  0, 0,   1
>>>>> 4,   10/1/2007,  1/1/2008,      92, 1, -1, -1,-1,  -1
>>>>> 5 ,    6/1/2011,    1/1/2012,  214,  0,  0,  1,-1,  -1
>>>>> 6, 10/15/2004, 12/1/2004,     47, 1, -1, -1, -1, -1
>>>>> 
>>>>> Thank you.
>>>>> 
>>>>>        [[alternative HTML version deleted]]
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>>> ng-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>> 
>>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>>> 
>> ------------------------------------------------------------
>> ---------------
>> Jeff Newmiller                        The     .....       .....  Go Live...
>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>> Go...
>>                                      Live:   OO#.. Dead: OO#..  Playing
>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
>> ------------------------------------------------------------
>> ---------------
>> 
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list