[R] rle with data.table - is it possible?

Beejai kate.ignatius at gmail.com
Sat Jan 3 06:35:44 CET 2015


What are you having trouble with exactly?  Do you need a bigger
example.  The code works perfectly well with your code so I'm sure how
you are finding trouble with it (minus the fact that I had put in a
few errors in myself at the beginning with I apologize).

On Fri, Jan 2, 2015 at 9:05 PM, Jeff Newmiller [via R]
<ml-node+s789695n4701333h78 at n4.nabble.com> wrote:
> The problem is that I cannot see how your use of rle and/or seq_along could
> possibly lead to the sample result you are giving us. That is why I asked
> for a new example.
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> On January 2, 2015 5:11:09 PM PST, Beejai <[hidden email]> wrote:
>
>>Obviously this is why I need help...
>>
>>This is a larger data frame.  I'm only posting something small here to
>>make it simple.  There are many Groups which are larger, and I want to
>>assign a sequence value to consecutive rows where sumchild in not
>>equal to 0.  As the data frame I'm working with is much larger, this
>>goes up to 100 maybe even 200 and I have many different groups 20K+.
>>I would like to do this for every group, not for the whole data frame.
>>
>>There is no particular science behind this, only data organizing.
>>
>>So just say we had data like so:
>>
>>    Dad Mum Child Group sumdad summum sumchild childseg
>> 1:  AA  RR    RA     A      2      2        0        0
>> 2:  AA  RR    RR     A      2      2        1        1
>> 3:  AA  AA    AA     B      4      5        5        1
>> 4:  AA  AA    RA     B      4      5        5        0
>> 5:  RA  AA    RR     B      0      5        5        2
>> 6:  RR  AA    RR     B      4      5        5        2
>> 7:  AA  AA    AA     B      4      5        5        2
>> 8:  AA  AA    AA     C      3      3        0        1
>> 9:  AA  AA    RA     C      3      3        0        0
>>10:  AA  RR    RR     C      3      3        0        2
>> 11:  AA  RR    RA     C     2      2        0        0
>> 12:  AA  RR    RR     C      2      2        1        3
>> 13:  AA  AA    AA     C      4      5        5        3
>> 14:  AA  AA    RA     C      4      5        5        0
>> 15:  RA  AA    RR     C      0      5        5        4
>>
>>On Fri, Jan 2, 2015 at 12:29 PM, David Winsemius [via R]
>><[hidden email]> wrote:
>>>
>>> On Jan 2, 2015, at 12:07 AM, Kate Ignatius wrote:
>>>
>>>> Ah, crap.  Yep you're right.  This is not going too well. Okay - let
>>>> me try that again:
>>>>
>>>> x$childseg<-0
>>>> x<-x$sumchild !=0
>>>
>>> That previous line would appear to overwrite the entire dataframe
>>with the
>>> value of one vector
>>>
>>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>>> x$childseg[x]<-rep(seq_along(span), times = span)
>>>>
>>>> Does this one have any errors?
>>> Even assuming that the code from Jeff Newmiller is creating those
>>objects I
>>> get
>>>
>>>> x$childseg[x]<-rep(seq_along(span), times = span)
>>> Error in `*tmp*`$childseg : $ operator is invalid for atomic vectors
>>>
>>> In the last line you are indexing a vector with a dataframe (or
>>perhaps a
>>> data.table).
>>>
>>> If we use Newmiller's object and then change some of the instances of
>>"x" in
>>> your code to DT we get:
>>>
>>>> DT$childseg<-0
>>>> x<-DT$sumchild !=0  # Try not to overwrite your data-objects
>>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>>> DT$childseg[x]<-rep(seq_along(span), times = span)
>>>> DT
>>>     Dad Mum Child Group sumdad summum sumchild childseg
>>>  1:  AA  RR    RA     A      2      2        0        0
>>>  2:  AA  RR    RR     A      2      2        1        1
>>>  3:  AA  AA    AA     B      4      5        5        1
>>>  4:  AA  AA    AA     B      4      5        5        1
>>>  5:  RA  AA    RR     B      0      5        5        1
>>>  6:  RR  AA    RR     B      4      5        5        1
>>>  7:  AA  AA    AA     B      4      5        5        1
>>>  8:  AA  AA    RA     C      3      3        0        0
>>>  9:  AA  AA    RA     C      3      3        0        0
>>> 10:  AA  RR    RA     C      3      3        0        0
>>>
>>> You persist in posting code where you do not explain what you are
>>trying to
>>> do with it. You have already been told that your earlier efforts
>>using `rle`
>>> did not make any sense. Post a complete example and then explain what
>>you
>>> desire as an object. It's often helpful to provide a scientific
>>background
>>> for what the data represents.
>>>
>>> --
>>> David.
>>>
>>>>
>>>>
>>>> On Fri, Jan 2, 2015 at 2:32 AM, David Winsemius <[hidden email]>
>>wrote:
>>>>>
>>>>>> On Jan 1, 2015, at 5:07 PM, Kate Ignatius <[hidden email]> wrote:
>>>>>>
>>>>>> Apologies - mix up of syntax all over the place, a habit of mine.
>>The
>>>>>> last line was in there because of code beforehand so it really
>>doesn't
>>>>>> need to be there.  Here is the proper code I hope:
>>>>>>
>>>>>> childseg<-0
>>>>>> x<-sumchild ==0
>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>>>>
>>>>>
>>>>> This remains not reproducible. We have no idea what sumchild might
>>be and
>>>>> the code throws an error. My guess is that you are trying to get a
>>result
>>>>> such as would be delivered by:
>>>>>
>>>>> childseg <- sumchild[ sumchild != 0 ]
>>>>>
>>>>>>>>>> David.
>>>>>
>>>>>>
>>>>>> On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
>>>>>> <[hidden email]> wrote:
>>>>>>> Thank you for attempting to encode what you want using R syntax,
>>but
>>>>>>> you are not really succeeding yet (too many errors). Perhaps
>>another hand
>>>>>>> generated result would help? A new input data frame might or
>>might not be
>>>>>>> needed to illustrate desired results.
>>>>>>>
>>>>>>> Your second and third lines are  syntactically incorrect, and I
>>don't
>>>>>>> understand what you hope to accomplish by assigning an empty
>>string to a
>>>>>>> numeric in your last line.
>>>>>>>
>>>>>>>
>>---------------------------------------------------------------------------
>>>>>>> Jeff Newmiller                        The     .....       .....
>>Go
>>>>>>> Live...
>>>>>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>>>>>>>                                     Live:   OO#.. Dead: OO#..
>>Playing
>>>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.
>>with
>>>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>>>>> rocks...1k
>>>>>>>
>>>>>>>
>>---------------------------------------------------------------------------
>>>>>>> Sent from my phone. Please excuse my brevity.
>>>>>>>
>>>>>>> On January 1, 2015 4:16:52 AM PST, Kate Ignatius <[hidden email]>
>>>>>>> wrote:
>>>>>>>> Is it possible to add the following code or similar in
>>data.table:
>>>>>>>>
>>>>>>>> childseg<-0
>>>>>>>> x:=sumchild <-0
>>>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE
>>>>>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>>>>>> childseg[childseg == 0]<-''
>>>>>>>>
>>>>>>>> I was hoping to do this code by Group for mum, dad and
>>>>>>>> child.  The problem I'm having is with the
>>>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure
>>can
>>>>>>>> be added to data.table.
>>>>>>>>
>>>>>>>> [Previous email had incorrect code]
>>>>>>>>
>>>>>>>> On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
>>>>>>>> <[hidden email]> wrote:
>>>>>>>>> I do not understand the value of using the rle function in your
>>>>>>>> description,
>>>>>>>>> but the code below appears to produce the table you want.
>>>>>>>>>
>>>>>>>>> Note that better support for the data.table package might be
>>found at
>>>>>>>>> stackexchange as the documentation specifies.
>>>>>>>>>
>>>>>>>>> x <- read.table( text=
>>>>>>>>> "Dad Mum Child Group
>>>>>>>>> AA RR RA A
>>>>>>>>> AA RR RR A
>>>>>>>>> AA AA AA B
>>>>>>>>> AA AA AA B
>>>>>>>>> RA AA RR B
>>>>>>>>> RR AA RR B
>>>>>>>>> AA AA AA B
>>>>>>>>> AA AA RA C
>>>>>>>>> AA AA RA C
>>>>>>>>> AA RR RA C
>>>>>>>>> ", header=TRUE, stringsAsFactors=FALSE )
>>>>>>>>>
>>>>>>>>> library(data.table)
>>>>>>>>> DT <- data.table( x )
>>>>>>>>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
>>>>>>>>> DT[ , sumdad := 0L ]
>>>>>>>>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
>>>>>>>>> DT[ , cdad := NULL ]
>>>>>>>>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
>>>>>>>>> DT[ , summum := 0L ]
>>>>>>>>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
>>>>>>>>> DT[ , cmum := NULL ]
>>>>>>>>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
>>>>>>>>> DT[ , sumchild := 0L ]
>>>>>>>>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
>>>>>>>>> DT[ , cchild := NULL ]
>>>>>>>>>
>>>>>>>>>> DT
>>>>>>>>>
>>>>>>>>>   Dad Mum Child Group sumdad summum sumchild
>>>>>>>>> 1:  AA  RR    RA     A      2      2        0
>>>>>>>>> 2:  AA  RR    RR     A      2      2        1
>>>>>>>>> 3:  AA  AA    AA     B      4      5        5
>>>>>>>>> 4:  AA  AA    AA     B      4      5        5
>>>>>>>>> 5:  RA  AA    RR     B      0      5        5
>>>>>>>>> 6:  RR  AA    RR     B      4      5        5
>>>>>>>>> 7:  AA  AA    AA     B      4      5        5
>>>>>>>>> 8:  AA  AA    RA     C      3      3        0
>>>>>>>>> 9:  AA  AA    RA     C      3      3        0
>>>>>>>>> 10:  AA  RR    RA     C      3      3        0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>>>>>>>>>
>>>>>>>>>> I'm trying to use both these packages and wondering whether
>>they are
>>>>>>>>>> possible...
>>>>>>>>>>
>>>>>>>>>> To make this simple, my ultimate goal is determine long
>>stretches of
>>>>>>>>>> 1s, but I want to do this within groups (hence using the
>>data.table
>>>>>>>> as
>>>>>>>>>> I use the "set key" option.  However, I'm I'm not having much
>>luck
>>>>>>>>>> making this possible.
>>>>>>>>>>
>>>>>>>>>> For example, for simplistic sake, I have the following data:
>>>>>>>>>>
>>>>>>>>>> Dad Mum Child Group
>>>>>>>>>> AA RR RA A
>>>>>>>>>> AA RR RR A
>>>>>>>>>> AA AA AA B
>>>>>>>>>> AA AA AA B
>>>>>>>>>> RA AA RR B
>>>>>>>>>> RR AA RR B
>>>>>>>>>> AA AA AA B
>>>>>>>>>> AA AA RA C
>>>>>>>>>> AA AA RA C
>>>>>>>>>> AA RR RA  C
>>>>>>>>>>
>>>>>>>>>> And the following code which I know works
>>>>>>>>>>
>>>>>>>>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>>>>>>>>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>>>>>>>>>
>>>>>>>>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>>>>>>>>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>>>>>>>>>
>>>>>>>>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>>>>>>>>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>>>>>>>>>
>>>>>>>>>> However, I wish to do the above code by Group (though this
>>file is
>>>>>>>>>> millions of rows long and groups will be larger but just
>>wanted to
>>>>>>>>>> simply the example).
>>>>>>>>>>
>>>>>>>>>> I did something like this but of course I got an error:
>>>>>>>>>>
>>>>>>>>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>>>>>>>>>>
>>LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>>>>>>>>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>>>>>>>>>>
>>LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>>>>>>>>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>>>>>>>>>
>>>>>>>>>> The reason being as I want to eventually have something like
>>this:
>>>>>>>>>>
>>>>>>>>>> Dad Mum Child Group sumdad summum sumchild
>>>>>>>>>> AA RR RA A 2 2 0
>>>>>>>>>> AA RR RR A 2 2 1
>>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>>> RA AA RR B 0 5 5
>>>>>>>>>> RR AA RR B 4 5 5
>>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>>> AA AA RA C 3 3 0
>>>>>>>>>> AA AA RA C 3 3 0
>>>>>>>>>> AA RR RA  C 3 3 0
>>>>>>>>>>
>>>>>>>>>> That is, I would like to have the specific counts next to what
>>I'm
>>>>>>>>>> consecutively counting per group.  So for Group A for dad
>>there are
>>>>>>>> 2
>>>>>>>>>> AAs,  there are two RRs for mum but only 1 AA or RR for the
>>child
>>>>>>>> and
>>>>>>>>>> that is RR (so the 1 is next to the RR and not the RA).
>>>>>>>>>>
>>>>>>>>>> Can this be done?
>>>>>>>>>>
>>>>>>>>>> K.
>>>>>>>>>>
>>>>>>>>>> ______________________________________________
>>>>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>code.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>---------------------------------------------------------------------------
>>>>>>>>> Jeff Newmiller                        The     .....       .....
>> Go
>>>>>>>> Live...
>>>>>>>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live
>>>>>>>> Go...
>>>>>>>>>                                     Live:   OO#.. Dead: OO#..
>>>>>>>> Playing
>>>>>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.
>> with
>>>>>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>>>>>> rocks...1k
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>---------------------------------------------------------------------------
>>>>>>>
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>
>>> David Winsemius
>>> Alameda, CA, USA
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>> ________________________________
>>> If you reply to this email, your message will be added to the
>>discussion
>>> below:
>>>
>>http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701316.html
>>> To unsubscribe from rle with data.table - is it possible?, click
>>here.
>>> NAML
>>
>>
>>
>>
>>--
>>View this message in context:
>>http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701332.html
>>Sent from the R help mailing list archive at Nabble.com.
>> [[alternative HTML version deleted]]
>>
>>______________________________________________
>>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701333.html
> To unsubscribe from rle with data.table - is it possible?, click here.
> NAML




--
View this message in context: http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701335.html
Sent from the R help mailing list archive at Nabble.com.
	[[alternative HTML version deleted]]



More information about the R-help mailing list