[R] rle with data.table - is it possible?

Kate Ignatius kate.ignatius at gmail.com
Fri Jan 2 09:07:28 CET 2015


Ah, crap.  Yep you're right.  This is not going too well. Okay - let
me try that again:

x$childseg<-0
x<-x$sumchild !=0
span<-rle(x)$lengths[rle(x)$values==TRUE]
x$childseg[x]<-rep(seq_along(span), times = span)

Does this one have any errors?


On Fri, Jan 2, 2015 at 2:32 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>
>> On Jan 1, 2015, at 5:07 PM, Kate Ignatius <kate.ignatius at gmail.com> wrote:
>>
>> Apologies - mix up of syntax all over the place, a habit of mine.  The
>> last line was in there because of code beforehand so it really doesn't
>> need to be there.  Here is the proper code I hope:
>>
>> childseg<-0
>> x<-sumchild ==0
>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>> childseg[x]<-rep(seq_along(span), times = span)
>>
>
> This remains not reproducible. We have no idea what sumchild might be and the code throws an error. My guess is that you are trying to get a result such as would be delivered by:
>
> childseg <- sumchild[ sumchild != 0 ]
>
>> David.
>
>>
>> On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
>> <jdnewmil at dcn.davis.ca.us> wrote:
>>> Thank you for attempting to encode what you want using R syntax, but you are not really succeeding yet (too many errors). Perhaps another hand generated result would help? A new input data frame might or might not be needed to illustrate desired results.
>>>
>>> Your second and third lines are  syntactically incorrect, and I don't understand what you hope to accomplish by assigning an empty string to a numeric in your last line.
>>> ---------------------------------------------------------------------------
>>> Jeff Newmiller                        The     .....       .....  Go Live...
>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>>>                                      Live:   OO#.. Dead: OO#..  Playing
>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
>>> ---------------------------------------------------------------------------
>>> Sent from my phone. Please excuse my brevity.
>>>
>>> On January 1, 2015 4:16:52 AM PST, Kate Ignatius <kate.ignatius at gmail.com> wrote:
>>>> Is it possible to add the following code or similar in data.table:
>>>>
>>>> childseg<-0
>>>> x:=sumchild <-0
>>>> span<-rle(x)$lengths[rle(x)$values==TRUE
>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>> childseg[childseg == 0]<-''
>>>>
>>>> I was hoping to do this code by Group for mum, dad and
>>>> child.  The problem I'm having is with the
>>>> span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can
>>>> be added to data.table.
>>>>
>>>> [Previous email had incorrect code]
>>>>
>>>> On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
>>>> <jdnewmil at dcn.davis.ca.us> wrote:
>>>>> I do not understand the value of using the rle function in your
>>>> description,
>>>>> but the code below appears to produce the table you want.
>>>>>
>>>>> Note that better support for the data.table package might be found at
>>>>> stackexchange as the documentation specifies.
>>>>>
>>>>> x <- read.table( text=
>>>>> "Dad Mum Child Group
>>>>> AA RR RA A
>>>>> AA RR RR A
>>>>> AA AA AA B
>>>>> AA AA AA B
>>>>> RA AA RR B
>>>>> RR AA RR B
>>>>> AA AA AA B
>>>>> AA AA RA C
>>>>> AA AA RA C
>>>>> AA RR RA C
>>>>> ", header=TRUE, stringsAsFactors=FALSE )
>>>>>
>>>>> library(data.table)
>>>>> DT <- data.table( x )
>>>>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
>>>>> DT[ , sumdad := 0L ]
>>>>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
>>>>> DT[ , cdad := NULL ]
>>>>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
>>>>> DT[ , summum := 0L ]
>>>>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
>>>>> DT[ , cmum := NULL ]
>>>>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
>>>>> DT[ , sumchild := 0L ]
>>>>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
>>>>> DT[ , cchild := NULL ]
>>>>>
>>>>>> DT
>>>>>
>>>>>    Dad Mum Child Group sumdad summum sumchild
>>>>> 1:  AA  RR    RA     A      2      2        0
>>>>> 2:  AA  RR    RR     A      2      2        1
>>>>> 3:  AA  AA    AA     B      4      5        5
>>>>> 4:  AA  AA    AA     B      4      5        5
>>>>> 5:  RA  AA    RR     B      0      5        5
>>>>> 6:  RR  AA    RR     B      4      5        5
>>>>> 7:  AA  AA    AA     B      4      5        5
>>>>> 8:  AA  AA    RA     C      3      3        0
>>>>> 9:  AA  AA    RA     C      3      3        0
>>>>> 10:  AA  RR    RA     C      3      3        0
>>>>>
>>>>>
>>>>> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>>>>>
>>>>>> I'm trying to use both these packages and wondering whether they are
>>>>>> possible...
>>>>>>
>>>>>> To make this simple, my ultimate goal is determine long stretches of
>>>>>> 1s, but I want to do this within groups (hence using the data.table
>>>> as
>>>>>> I use the "set key" option.  However, I'm I'm not having much luck
>>>>>> making this possible.
>>>>>>
>>>>>> For example, for simplistic sake, I have the following data:
>>>>>>
>>>>>> Dad Mum Child Group
>>>>>> AA RR RA A
>>>>>> AA RR RR A
>>>>>> AA AA AA B
>>>>>> AA AA AA B
>>>>>> RA AA RR B
>>>>>> RR AA RR B
>>>>>> AA AA AA B
>>>>>> AA AA RA C
>>>>>> AA AA RA C
>>>>>> AA RR RA  C
>>>>>>
>>>>>> And the following code which I know works
>>>>>>
>>>>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>>>>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>>>>>
>>>>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>>>>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>>>>>
>>>>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>>>>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>>>>>
>>>>>> However, I wish to do the above code by Group (though this file is
>>>>>> millions of rows long and groups will be larger but just wanted to
>>>>>> simply the example).
>>>>>>
>>>>>> I did something like this but of course I got an error:
>>>>>>
>>>>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>>>>>> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>>>>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>>>>>> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>>>>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>>>>>>
>>>> LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>>>>>
>>>>>> The reason being as I want to eventually have something like this:
>>>>>>
>>>>>> Dad Mum Child Group sumdad summum sumchild
>>>>>> AA RR RA A 2 2 0
>>>>>> AA RR RR A 2 2 1
>>>>>> AA AA AA B 4 5 5
>>>>>> AA AA AA B 4 5 5
>>>>>> RA AA RR B 0 5 5
>>>>>> RR AA RR B 4 5 5
>>>>>> AA AA AA B 4 5 5
>>>>>> AA AA RA C 3 3 0
>>>>>> AA AA RA C 3 3 0
>>>>>> AA RR RA  C 3 3 0
>>>>>>
>>>>>> That is, I would like to have the specific counts next to what I'm
>>>>>> consecutively counting per group.  So for Group A for dad there are
>>>> 2
>>>>>> AAs,  there are two RRs for mum but only 1 AA or RR for the child
>>>> and
>>>>>> that is RR (so the 1 is next to the RR and not the RA).
>>>>>>
>>>>>> Can this be done?
>>>>>>
>>>>>> K.
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------------
>>>>> Jeff Newmiller                        The     .....       .....  Go
>>>> Live...
>>>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>>>> Go...
>>>>>                                      Live:   OO#.. Dead: OO#..
>>>> Playing
>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>> rocks...1k
>>>>>
>>>> ---------------------------------------------------------------------------
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list