[R] rle with data.table - is it possible?

David Winsemius dwinsemius at comcast.net
Fri Jan 2 18:28:59 CET 2015


On Jan 2, 2015, at 12:07 AM, Kate Ignatius wrote:

> Ah, crap.  Yep you're right.  This is not going too well. Okay - let
> me try that again:
> 
> x$childseg<-0
> x<-x$sumchild !=0

That previous line would appear to overwrite the entire dataframe with the value of one vector

> span<-rle(x)$lengths[rle(x)$values==TRUE]
> x$childseg[x]<-rep(seq_along(span), times = span)
> 
> Does this one have any errors?
Even assuming that the code from Jeff Newmiller is creating those objects I get 

> x$childseg[x]<-rep(seq_along(span), times = span)
Error in `*tmp*`$childseg : $ operator is invalid for atomic vectors

In the last line you are indexing a vector with a dataframe (or perhaps a data.table). 

If we use Newmiller's object and then change some of the instances of "x" in your code to DT we get:

> DT$childseg<-0
> x<-DT$sumchild !=0  # Try not to overwrite your data-objects
> span<-rle(x)$lengths[rle(x)$values==TRUE]
> DT$childseg[x]<-rep(seq_along(span), times = span)
> DT
    Dad Mum Child Group sumdad summum sumchild childseg
 1:  AA  RR    RA     A      2      2        0        0
 2:  AA  RR    RR     A      2      2        1        1
 3:  AA  AA    AA     B      4      5        5        1
 4:  AA  AA    AA     B      4      5        5        1
 5:  RA  AA    RR     B      0      5        5        1
 6:  RR  AA    RR     B      4      5        5        1
 7:  AA  AA    AA     B      4      5        5        1
 8:  AA  AA    RA     C      3      3        0        0
 9:  AA  AA    RA     C      3      3        0        0
10:  AA  RR    RA     C      3      3        0        0

You persist in posting code where you do not explain what you are trying to do with it. You have already been told that your earlier efforts using `rle` did not make any sense. Post a complete example and then explain what you desire as an object. It's often helpful to provide a scientific background for what the data represents.

-- 
David.

> 
> 
> On Fri, Jan 2, 2015 at 2:32 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>> 
>>> On Jan 1, 2015, at 5:07 PM, Kate Ignatius <kate.ignatius at gmail.com> wrote:
>>> 
>>> Apologies - mix up of syntax all over the place, a habit of mine.  The
>>> last line was in there because of code beforehand so it really doesn't
>>> need to be there.  Here is the proper code I hope:
>>> 
>>> childseg<-0
>>> x<-sumchild ==0
>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>> childseg[x]<-rep(seq_along(span), times = span)
>>> 
>> 
>> This remains not reproducible. We have no idea what sumchild might be and the code throws an error. My guess is that you are trying to get a result such as would be delivered by:
>> 
>> childseg <- sumchild[ sumchild != 0 ]
>> 
>>>> David.
>> 
>>> 
>>> On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
>>> <jdnewmil at dcn.davis.ca.us> wrote:
>>>> Thank you for attempting to encode what you want using R syntax, but you are not really succeeding yet (too many errors). Perhaps another hand generated result would help? A new input data frame might or might not be needed to illustrate desired results.
>>>> 
>>>> Your second and third lines are  syntactically incorrect, and I don't understand what you hope to accomplish by assigning an empty string to a numeric in your last line.
>>>> ---------------------------------------------------------------------------
>>>> Jeff Newmiller                        The     .....       .....  Go Live...
>>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>>>>                                     Live:   OO#.. Dead: OO#..  Playing
>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>>> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
>>>> ---------------------------------------------------------------------------
>>>> Sent from my phone. Please excuse my brevity.
>>>> 
>>>> On January 1, 2015 4:16:52 AM PST, Kate Ignatius <kate.ignatius at gmail.com> wrote:
>>>>> Is it possible to add the following code or similar in data.table:
>>>>> 
>>>>> childseg<-0
>>>>> x:=sumchild <-0
>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE
>>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>>> childseg[childseg == 0]<-''
>>>>> 
>>>>> I was hoping to do this code by Group for mum, dad and
>>>>> child.  The problem I'm having is with the
>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can
>>>>> be added to data.table.
>>>>> 
>>>>> [Previous email had incorrect code]
>>>>> 
>>>>> On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
>>>>> <jdnewmil at dcn.davis.ca.us> wrote:
>>>>>> I do not understand the value of using the rle function in your
>>>>> description,
>>>>>> but the code below appears to produce the table you want.
>>>>>> 
>>>>>> Note that better support for the data.table package might be found at
>>>>>> stackexchange as the documentation specifies.
>>>>>> 
>>>>>> x <- read.table( text=
>>>>>> "Dad Mum Child Group
>>>>>> AA RR RA A
>>>>>> AA RR RR A
>>>>>> AA AA AA B
>>>>>> AA AA AA B
>>>>>> RA AA RR B
>>>>>> RR AA RR B
>>>>>> AA AA AA B
>>>>>> AA AA RA C
>>>>>> AA AA RA C
>>>>>> AA RR RA C
>>>>>> ", header=TRUE, stringsAsFactors=FALSE )
>>>>>> 
>>>>>> library(data.table)
>>>>>> DT <- data.table( x )
>>>>>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
>>>>>> DT[ , sumdad := 0L ]
>>>>>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
>>>>>> DT[ , cdad := NULL ]
>>>>>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
>>>>>> DT[ , summum := 0L ]
>>>>>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
>>>>>> DT[ , cmum := NULL ]
>>>>>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
>>>>>> DT[ , sumchild := 0L ]
>>>>>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
>>>>>> DT[ , cchild := NULL ]
>>>>>> 
>>>>>>> DT
>>>>>> 
>>>>>>   Dad Mum Child Group sumdad summum sumchild
>>>>>> 1:  AA  RR    RA     A      2      2        0
>>>>>> 2:  AA  RR    RR     A      2      2        1
>>>>>> 3:  AA  AA    AA     B      4      5        5
>>>>>> 4:  AA  AA    AA     B      4      5        5
>>>>>> 5:  RA  AA    RR     B      0      5        5
>>>>>> 6:  RR  AA    RR     B      4      5        5
>>>>>> 7:  AA  AA    AA     B      4      5        5
>>>>>> 8:  AA  AA    RA     C      3      3        0
>>>>>> 9:  AA  AA    RA     C      3      3        0
>>>>>> 10:  AA  RR    RA     C      3      3        0
>>>>>> 
>>>>>> 
>>>>>> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>>>>>> 
>>>>>>> I'm trying to use both these packages and wondering whether they are
>>>>>>> possible...
>>>>>>> 
>>>>>>> To make this simple, my ultimate goal is determine long stretches of
>>>>>>> 1s, but I want to do this within groups (hence using the data.table
>>>>> as
>>>>>>> I use the "set key" option.  However, I'm I'm not having much luck
>>>>>>> making this possible.
>>>>>>> 
>>>>>>> For example, for simplistic sake, I have the following data:
>>>>>>> 
>>>>>>> Dad Mum Child Group
>>>>>>> AA RR RA A
>>>>>>> AA RR RR A
>>>>>>> AA AA AA B
>>>>>>> AA AA AA B
>>>>>>> RA AA RR B
>>>>>>> RR AA RR B
>>>>>>> AA AA AA B
>>>>>>> AA AA RA C
>>>>>>> AA AA RA C
>>>>>>> AA RR RA  C
>>>>>>> 
>>>>>>> And the following code which I know works
>>>>>>> 
>>>>>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>>>>>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>>>>>> 
>>>>>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>>>>>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>>>>>> 
>>>>>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>>>>>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>>>>>> 
>>>>>>> However, I wish to do the above code by Group (though this file is
>>>>>>> millions of rows long and groups will be larger but just wanted to
>>>>>>> simply the example).
>>>>>>> 
>>>>>>> I did something like this but of course I got an error:
>>>>>>> 
>>>>>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>>>>>>> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>>>>>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>>>>>>> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>>>>>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>>>>>>> 
>>>>> LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>>>>>> 
>>>>>>> The reason being as I want to eventually have something like this:
>>>>>>> 
>>>>>>> Dad Mum Child Group sumdad summum sumchild
>>>>>>> AA RR RA A 2 2 0
>>>>>>> AA RR RR A 2 2 1
>>>>>>> AA AA AA B 4 5 5
>>>>>>> AA AA AA B 4 5 5
>>>>>>> RA AA RR B 0 5 5
>>>>>>> RR AA RR B 4 5 5
>>>>>>> AA AA AA B 4 5 5
>>>>>>> AA AA RA C 3 3 0
>>>>>>> AA AA RA C 3 3 0
>>>>>>> AA RR RA  C 3 3 0
>>>>>>> 
>>>>>>> That is, I would like to have the specific counts next to what I'm
>>>>>>> consecutively counting per group.  So for Group A for dad there are
>>>>> 2
>>>>>>> AAs,  there are two RRs for mum but only 1 AA or RR for the child
>>>>> and
>>>>>>> that is RR (so the 1 is next to the RR and not the RA).
>>>>>>> 
>>>>>>> Can this be done?
>>>>>>> 
>>>>>>> K.
>>>>>>> 
>>>>>>> ______________________________________________
>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> ---------------------------------------------------------------------------
>>>>>> Jeff Newmiller                        The     .....       .....  Go
>>>>> Live...
>>>>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>>>>> Go...
>>>>>>                                     Live:   OO#.. Dead: OO#..
>>>>> Playing
>>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>>> rocks...1k
>>>>>> 
>>>>> ---------------------------------------------------------------------------
>>>> 
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list