[R] rle with data.table - is it possible?

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Sat Jan 3 03:04:40 CET 2015


The problem is that I cannot see how your use of rle and/or seq_along could possibly lead to the sample result you are giving us. That is why I asked for a new example.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On January 2, 2015 5:11:09 PM PST, Beejai <kate.ignatius at gmail.com> wrote:
>Obviously this is why I need help...
>
>This is a larger data frame.  I'm only posting something small here to
>make it simple.  There are many Groups which are larger, and I want to
>assign a sequence value to consecutive rows where sumchild in not
>equal to 0.  As the data frame I'm working with is much larger, this
>goes up to 100 maybe even 200 and I have many different groups 20K+.
>I would like to do this for every group, not for the whole data frame.
>
>There is no particular science behind this, only data organizing.
>
>So just say we had data like so:
>
>    Dad Mum Child Group sumdad summum sumchild childseg
> 1:  AA  RR    RA     A      2      2        0        0
> 2:  AA  RR    RR     A      2      2        1        1
> 3:  AA  AA    AA     B      4      5        5        1
> 4:  AA  AA    RA     B      4      5        5        0
> 5:  RA  AA    RR     B      0      5        5        2
> 6:  RR  AA    RR     B      4      5        5        2
> 7:  AA  AA    AA     B      4      5        5        2
> 8:  AA  AA    AA     C      3      3        0        1
> 9:  AA  AA    RA     C      3      3        0        0
>10:  AA  RR    RR     C      3      3        0        2
> 11:  AA  RR    RA     C     2      2        0        0
> 12:  AA  RR    RR     C      2      2        1        3
> 13:  AA  AA    AA     C      4      5        5        3
> 14:  AA  AA    RA     C      4      5        5        0
> 15:  RA  AA    RR     C      0      5        5        4
>
>On Fri, Jan 2, 2015 at 12:29 PM, David Winsemius [via R]
><ml-node+s789695n4701316h51 at n4.nabble.com> wrote:
>>
>> On Jan 2, 2015, at 12:07 AM, Kate Ignatius wrote:
>>
>>> Ah, crap.  Yep you're right.  This is not going too well. Okay - let
>>> me try that again:
>>>
>>> x$childseg<-0
>>> x<-x$sumchild !=0
>>
>> That previous line would appear to overwrite the entire dataframe
>with the
>> value of one vector
>>
>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>> x$childseg[x]<-rep(seq_along(span), times = span)
>>>
>>> Does this one have any errors?
>> Even assuming that the code from Jeff Newmiller is creating those
>objects I
>> get
>>
>>> x$childseg[x]<-rep(seq_along(span), times = span)
>> Error in `*tmp*`$childseg : $ operator is invalid for atomic vectors
>>
>> In the last line you are indexing a vector with a dataframe (or
>perhaps a
>> data.table).
>>
>> If we use Newmiller's object and then change some of the instances of
>"x" in
>> your code to DT we get:
>>
>>> DT$childseg<-0
>>> x<-DT$sumchild !=0  # Try not to overwrite your data-objects
>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>> DT$childseg[x]<-rep(seq_along(span), times = span)
>>> DT
>>     Dad Mum Child Group sumdad summum sumchild childseg
>>  1:  AA  RR    RA     A      2      2        0        0
>>  2:  AA  RR    RR     A      2      2        1        1
>>  3:  AA  AA    AA     B      4      5        5        1
>>  4:  AA  AA    AA     B      4      5        5        1
>>  5:  RA  AA    RR     B      0      5        5        1
>>  6:  RR  AA    RR     B      4      5        5        1
>>  7:  AA  AA    AA     B      4      5        5        1
>>  8:  AA  AA    RA     C      3      3        0        0
>>  9:  AA  AA    RA     C      3      3        0        0
>> 10:  AA  RR    RA     C      3      3        0        0
>>
>> You persist in posting code where you do not explain what you are
>trying to
>> do with it. You have already been told that your earlier efforts
>using `rle`
>> did not make any sense. Post a complete example and then explain what
>you
>> desire as an object. It's often helpful to provide a scientific
>background
>> for what the data represents.
>>
>> --
>> David.
>>
>>>
>>>
>>> On Fri, Jan 2, 2015 at 2:32 AM, David Winsemius <[hidden email]>
>wrote:
>>>>
>>>>> On Jan 1, 2015, at 5:07 PM, Kate Ignatius <[hidden email]> wrote:
>>>>>
>>>>> Apologies - mix up of syntax all over the place, a habit of mine. 
>The
>>>>> last line was in there because of code beforehand so it really
>doesn't
>>>>> need to be there.  Here is the proper code I hope:
>>>>>
>>>>> childseg<-0
>>>>> x<-sumchild ==0
>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>>>
>>>>
>>>> This remains not reproducible. We have no idea what sumchild might
>be and
>>>> the code throws an error. My guess is that you are trying to get a
>result
>>>> such as would be delivered by:
>>>>
>>>> childseg <- sumchild[ sumchild != 0 ]
>>>>
>>>>>>>> David.
>>>>
>>>>>
>>>>> On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
>>>>> <[hidden email]> wrote:
>>>>>> Thank you for attempting to encode what you want using R syntax,
>but
>>>>>> you are not really succeeding yet (too many errors). Perhaps
>another hand
>>>>>> generated result would help? A new input data frame might or
>might not be
>>>>>> needed to illustrate desired results.
>>>>>>
>>>>>> Your second and third lines are  syntactically incorrect, and I
>don't
>>>>>> understand what you hope to accomplish by assigning an empty
>string to a
>>>>>> numeric in your last line.
>>>>>>
>>>>>>
>---------------------------------------------------------------------------
>>>>>> Jeff Newmiller                        The     .....       ..... 
>Go
>>>>>> Live...
>>>>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>>>>>>                                     Live:   OO#.. Dead: OO#.. 
>Playing
>>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#. 
>with
>>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>>>> rocks...1k
>>>>>>
>>>>>>
>---------------------------------------------------------------------------
>>>>>> Sent from my phone. Please excuse my brevity.
>>>>>>
>>>>>> On January 1, 2015 4:16:52 AM PST, Kate Ignatius <[hidden email]>
>>>>>> wrote:
>>>>>>> Is it possible to add the following code or similar in
>data.table:
>>>>>>>
>>>>>>> childseg<-0
>>>>>>> x:=sumchild <-0
>>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE
>>>>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>>>>> childseg[childseg == 0]<-''
>>>>>>>
>>>>>>> I was hoping to do this code by Group for mum, dad and
>>>>>>> child.  The problem I'm having is with the
>>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure
>can
>>>>>>> be added to data.table.
>>>>>>>
>>>>>>> [Previous email had incorrect code]
>>>>>>>
>>>>>>> On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
>>>>>>> <[hidden email]> wrote:
>>>>>>>> I do not understand the value of using the rle function in your
>>>>>>> description,
>>>>>>>> but the code below appears to produce the table you want.
>>>>>>>>
>>>>>>>> Note that better support for the data.table package might be
>found at
>>>>>>>> stackexchange as the documentation specifies.
>>>>>>>>
>>>>>>>> x <- read.table( text=
>>>>>>>> "Dad Mum Child Group
>>>>>>>> AA RR RA A
>>>>>>>> AA RR RR A
>>>>>>>> AA AA AA B
>>>>>>>> AA AA AA B
>>>>>>>> RA AA RR B
>>>>>>>> RR AA RR B
>>>>>>>> AA AA AA B
>>>>>>>> AA AA RA C
>>>>>>>> AA AA RA C
>>>>>>>> AA RR RA C
>>>>>>>> ", header=TRUE, stringsAsFactors=FALSE )
>>>>>>>>
>>>>>>>> library(data.table)
>>>>>>>> DT <- data.table( x )
>>>>>>>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
>>>>>>>> DT[ , sumdad := 0L ]
>>>>>>>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
>>>>>>>> DT[ , cdad := NULL ]
>>>>>>>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
>>>>>>>> DT[ , summum := 0L ]
>>>>>>>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
>>>>>>>> DT[ , cmum := NULL ]
>>>>>>>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
>>>>>>>> DT[ , sumchild := 0L ]
>>>>>>>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
>>>>>>>> DT[ , cchild := NULL ]
>>>>>>>>
>>>>>>>>> DT
>>>>>>>>
>>>>>>>>   Dad Mum Child Group sumdad summum sumchild
>>>>>>>> 1:  AA  RR    RA     A      2      2        0
>>>>>>>> 2:  AA  RR    RR     A      2      2        1
>>>>>>>> 3:  AA  AA    AA     B      4      5        5
>>>>>>>> 4:  AA  AA    AA     B      4      5        5
>>>>>>>> 5:  RA  AA    RR     B      0      5        5
>>>>>>>> 6:  RR  AA    RR     B      4      5        5
>>>>>>>> 7:  AA  AA    AA     B      4      5        5
>>>>>>>> 8:  AA  AA    RA     C      3      3        0
>>>>>>>> 9:  AA  AA    RA     C      3      3        0
>>>>>>>> 10:  AA  RR    RA     C      3      3        0
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>>>>>>>>
>>>>>>>>> I'm trying to use both these packages and wondering whether
>they are
>>>>>>>>> possible...
>>>>>>>>>
>>>>>>>>> To make this simple, my ultimate goal is determine long
>stretches of
>>>>>>>>> 1s, but I want to do this within groups (hence using the
>data.table
>>>>>>> as
>>>>>>>>> I use the "set key" option.  However, I'm I'm not having much
>luck
>>>>>>>>> making this possible.
>>>>>>>>>
>>>>>>>>> For example, for simplistic sake, I have the following data:
>>>>>>>>>
>>>>>>>>> Dad Mum Child Group
>>>>>>>>> AA RR RA A
>>>>>>>>> AA RR RR A
>>>>>>>>> AA AA AA B
>>>>>>>>> AA AA AA B
>>>>>>>>> RA AA RR B
>>>>>>>>> RR AA RR B
>>>>>>>>> AA AA AA B
>>>>>>>>> AA AA RA C
>>>>>>>>> AA AA RA C
>>>>>>>>> AA RR RA  C
>>>>>>>>>
>>>>>>>>> And the following code which I know works
>>>>>>>>>
>>>>>>>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>>>>>>>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>>>>>>>>
>>>>>>>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>>>>>>>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>>>>>>>>
>>>>>>>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>>>>>>>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>>>>>>>>
>>>>>>>>> However, I wish to do the above code by Group (though this
>file is
>>>>>>>>> millions of rows long and groups will be larger but just
>wanted to
>>>>>>>>> simply the example).
>>>>>>>>>
>>>>>>>>> I did something like this but of course I got an error:
>>>>>>>>>
>>>>>>>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>>>>>>>>>
>LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>>>>>>>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>>>>>>>>>
>LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>>>>>>>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>>>>>>>>>
>>>>>>>
>>>>>>>
>LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>>>>>>>>
>>>>>>>>> The reason being as I want to eventually have something like
>this:
>>>>>>>>>
>>>>>>>>> Dad Mum Child Group sumdad summum sumchild
>>>>>>>>> AA RR RA A 2 2 0
>>>>>>>>> AA RR RR A 2 2 1
>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>> RA AA RR B 0 5 5
>>>>>>>>> RR AA RR B 4 5 5
>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>> AA AA RA C 3 3 0
>>>>>>>>> AA AA RA C 3 3 0
>>>>>>>>> AA RR RA  C 3 3 0
>>>>>>>>>
>>>>>>>>> That is, I would like to have the specific counts next to what
>I'm
>>>>>>>>> consecutively counting per group.  So for Group A for dad
>there are
>>>>>>> 2
>>>>>>>>> AAs,  there are two RRs for mum but only 1 AA or RR for the
>child
>>>>>>> and
>>>>>>>>> that is RR (so the 1 is next to the RR and not the RA).
>>>>>>>>>
>>>>>>>>> Can this be done?
>>>>>>>>>
>>>>>>>>> K.
>>>>>>>>>
>>>>>>>>> ______________________________________________
>>>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>code.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>---------------------------------------------------------------------------
>>>>>>>> Jeff Newmiller                        The     .....       .....
> Go
>>>>>>> Live...
>>>>>>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live
>>>>>>> Go...
>>>>>>>>                                     Live:   OO#.. Dead: OO#..
>>>>>>> Playing
>>>>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.
> with
>>>>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>>>>> rocks...1k
>>>>>>>>
>>>>>>>
>>>>>>>
>---------------------------------------------------------------------------
>>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ________________________________
>> If you reply to this email, your message will be added to the
>discussion
>> below:
>>
>http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701316.html
>> To unsubscribe from rle with data.table - is it possible?, click
>here.
>> NAML
>
>
>
>
>--
>View this message in context:
>http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701332.html
>Sent from the R help mailing list archive at Nabble.com.
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list