[R] rle with data.table - is it possible?
Jeff Newmiller
jdnewmil at dcn.davis.CA.us
Sat Jan 3 03:04:40 CET 2015
The problem is that I cannot see how your use of rle and/or seq_along could possibly lead to the sample result you are giving us. That is why I asked for a new example.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
On January 2, 2015 5:11:09 PM PST, Beejai <kate.ignatius at gmail.com> wrote:
>Obviously this is why I need help...
>
>This is a larger data frame. I'm only posting something small here to
>make it simple. There are many Groups which are larger, and I want to
>assign a sequence value to consecutive rows where sumchild in not
>equal to 0. As the data frame I'm working with is much larger, this
>goes up to 100 maybe even 200 and I have many different groups 20K+.
>I would like to do this for every group, not for the whole data frame.
>
>There is no particular science behind this, only data organizing.
>
>So just say we had data like so:
>
> Dad Mum Child Group sumdad summum sumchild childseg
> 1: AA RR RA A 2 2 0 0
> 2: AA RR RR A 2 2 1 1
> 3: AA AA AA B 4 5 5 1
> 4: AA AA RA B 4 5 5 0
> 5: RA AA RR B 0 5 5 2
> 6: RR AA RR B 4 5 5 2
> 7: AA AA AA B 4 5 5 2
> 8: AA AA AA C 3 3 0 1
> 9: AA AA RA C 3 3 0 0
>10: AA RR RR C 3 3 0 2
> 11: AA RR RA C 2 2 0 0
> 12: AA RR RR C 2 2 1 3
> 13: AA AA AA C 4 5 5 3
> 14: AA AA RA C 4 5 5 0
> 15: RA AA RR C 0 5 5 4
>
>On Fri, Jan 2, 2015 at 12:29 PM, David Winsemius [via R]
><ml-node+s789695n4701316h51 at n4.nabble.com> wrote:
>>
>> On Jan 2, 2015, at 12:07 AM, Kate Ignatius wrote:
>>
>>> Ah, crap. Yep you're right. This is not going too well. Okay - let
>>> me try that again:
>>>
>>> x$childseg<-0
>>> x<-x$sumchild !=0
>>
>> That previous line would appear to overwrite the entire dataframe
>with the
>> value of one vector
>>
>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>> x$childseg[x]<-rep(seq_along(span), times = span)
>>>
>>> Does this one have any errors?
>> Even assuming that the code from Jeff Newmiller is creating those
>objects I
>> get
>>
>>> x$childseg[x]<-rep(seq_along(span), times = span)
>> Error in `*tmp*`$childseg : $ operator is invalid for atomic vectors
>>
>> In the last line you are indexing a vector with a dataframe (or
>perhaps a
>> data.table).
>>
>> If we use Newmiller's object and then change some of the instances of
>"x" in
>> your code to DT we get:
>>
>>> DT$childseg<-0
>>> x<-DT$sumchild !=0 # Try not to overwrite your data-objects
>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>> DT$childseg[x]<-rep(seq_along(span), times = span)
>>> DT
>> Dad Mum Child Group sumdad summum sumchild childseg
>> 1: AA RR RA A 2 2 0 0
>> 2: AA RR RR A 2 2 1 1
>> 3: AA AA AA B 4 5 5 1
>> 4: AA AA AA B 4 5 5 1
>> 5: RA AA RR B 0 5 5 1
>> 6: RR AA RR B 4 5 5 1
>> 7: AA AA AA B 4 5 5 1
>> 8: AA AA RA C 3 3 0 0
>> 9: AA AA RA C 3 3 0 0
>> 10: AA RR RA C 3 3 0 0
>>
>> You persist in posting code where you do not explain what you are
>trying to
>> do with it. You have already been told that your earlier efforts
>using `rle`
>> did not make any sense. Post a complete example and then explain what
>you
>> desire as an object. It's often helpful to provide a scientific
>background
>> for what the data represents.
>>
>> --
>> David.
>>
>>>
>>>
>>> On Fri, Jan 2, 2015 at 2:32 AM, David Winsemius <[hidden email]>
>wrote:
>>>>
>>>>> On Jan 1, 2015, at 5:07 PM, Kate Ignatius <[hidden email]> wrote:
>>>>>
>>>>> Apologies - mix up of syntax all over the place, a habit of mine.
>The
>>>>> last line was in there because of code beforehand so it really
>doesn't
>>>>> need to be there. Here is the proper code I hope:
>>>>>
>>>>> childseg<-0
>>>>> x<-sumchild ==0
>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>>>
>>>>
>>>> This remains not reproducible. We have no idea what sumchild might
>be and
>>>> the code throws an error. My guess is that you are trying to get a
>result
>>>> such as would be delivered by:
>>>>
>>>> childseg <- sumchild[ sumchild != 0 ]
>>>>
>>>> —
>>>> David.
>>>>
>>>>>
>>>>> On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
>>>>> <[hidden email]> wrote:
>>>>>> Thank you for attempting to encode what you want using R syntax,
>but
>>>>>> you are not really succeeding yet (too many errors). Perhaps
>another hand
>>>>>> generated result would help? A new input data frame might or
>might not be
>>>>>> needed to illustrate desired results.
>>>>>>
>>>>>> Your second and third lines are syntactically incorrect, and I
>don't
>>>>>> understand what you hope to accomplish by assigning an empty
>string to a
>>>>>> numeric in your last line.
>>>>>>
>>>>>>
>---------------------------------------------------------------------------
>>>>>> Jeff Newmiller The ..... .....
>Go
>>>>>> Live...
>>>>>> DCN:<[hidden email]> Basics: ##.#. ##.#. Live Go...
>>>>>> Live: OO#.. Dead: OO#..
>Playing
>>>>>> Research Engineer (Solar/Batteries O.O#. #.O#.
>with
>>>>>> /Software/Embedded Controllers) .OO#. .OO#.
>>>>>> rocks...1k
>>>>>>
>>>>>>
>---------------------------------------------------------------------------
>>>>>> Sent from my phone. Please excuse my brevity.
>>>>>>
>>>>>> On January 1, 2015 4:16:52 AM PST, Kate Ignatius <[hidden email]>
>>>>>> wrote:
>>>>>>> Is it possible to add the following code or similar in
>data.table:
>>>>>>>
>>>>>>> childseg<-0
>>>>>>> x:=sumchild <-0
>>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE
>>>>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>>>>> childseg[childseg == 0]<-''
>>>>>>>
>>>>>>> I was hoping to do this code by Group for mum, dad and
>>>>>>> child. The problem I'm having is with the
>>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure
>can
>>>>>>> be added to data.table.
>>>>>>>
>>>>>>> [Previous email had incorrect code]
>>>>>>>
>>>>>>> On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
>>>>>>> <[hidden email]> wrote:
>>>>>>>> I do not understand the value of using the rle function in your
>>>>>>> description,
>>>>>>>> but the code below appears to produce the table you want.
>>>>>>>>
>>>>>>>> Note that better support for the data.table package might be
>found at
>>>>>>>> stackexchange as the documentation specifies.
>>>>>>>>
>>>>>>>> x <- read.table( text=
>>>>>>>> "Dad Mum Child Group
>>>>>>>> AA RR RA A
>>>>>>>> AA RR RR A
>>>>>>>> AA AA AA B
>>>>>>>> AA AA AA B
>>>>>>>> RA AA RR B
>>>>>>>> RR AA RR B
>>>>>>>> AA AA AA B
>>>>>>>> AA AA RA C
>>>>>>>> AA AA RA C
>>>>>>>> AA RR RA C
>>>>>>>> ", header=TRUE, stringsAsFactors=FALSE )
>>>>>>>>
>>>>>>>> library(data.table)
>>>>>>>> DT <- data.table( x )
>>>>>>>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
>>>>>>>> DT[ , sumdad := 0L ]
>>>>>>>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
>>>>>>>> DT[ , cdad := NULL ]
>>>>>>>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
>>>>>>>> DT[ , summum := 0L ]
>>>>>>>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
>>>>>>>> DT[ , cmum := NULL ]
>>>>>>>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
>>>>>>>> DT[ , sumchild := 0L ]
>>>>>>>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
>>>>>>>> DT[ , cchild := NULL ]
>>>>>>>>
>>>>>>>>> DT
>>>>>>>>
>>>>>>>> Dad Mum Child Group sumdad summum sumchild
>>>>>>>> 1: AA RR RA A 2 2 0
>>>>>>>> 2: AA RR RR A 2 2 1
>>>>>>>> 3: AA AA AA B 4 5 5
>>>>>>>> 4: AA AA AA B 4 5 5
>>>>>>>> 5: RA AA RR B 0 5 5
>>>>>>>> 6: RR AA RR B 4 5 5
>>>>>>>> 7: AA AA AA B 4 5 5
>>>>>>>> 8: AA AA RA C 3 3 0
>>>>>>>> 9: AA AA RA C 3 3 0
>>>>>>>> 10: AA RR RA C 3 3 0
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>>>>>>>>
>>>>>>>>> I'm trying to use both these packages and wondering whether
>they are
>>>>>>>>> possible...
>>>>>>>>>
>>>>>>>>> To make this simple, my ultimate goal is determine long
>stretches of
>>>>>>>>> 1s, but I want to do this within groups (hence using the
>data.table
>>>>>>> as
>>>>>>>>> I use the "set key" option. However, I'm I'm not having much
>luck
>>>>>>>>> making this possible.
>>>>>>>>>
>>>>>>>>> For example, for simplistic sake, I have the following data:
>>>>>>>>>
>>>>>>>>> Dad Mum Child Group
>>>>>>>>> AA RR RA A
>>>>>>>>> AA RR RR A
>>>>>>>>> AA AA AA B
>>>>>>>>> AA AA AA B
>>>>>>>>> RA AA RR B
>>>>>>>>> RR AA RR B
>>>>>>>>> AA AA AA B
>>>>>>>>> AA AA RA C
>>>>>>>>> AA AA RA C
>>>>>>>>> AA RR RA C
>>>>>>>>>
>>>>>>>>> And the following code which I know works
>>>>>>>>>
>>>>>>>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>>>>>>>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>>>>>>>>
>>>>>>>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>>>>>>>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>>>>>>>>
>>>>>>>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>>>>>>>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>>>>>>>>
>>>>>>>>> However, I wish to do the above code by Group (though this
>file is
>>>>>>>>> millions of rows long and groups will be larger but just
>wanted to
>>>>>>>>> simply the example).
>>>>>>>>>
>>>>>>>>> I did something like this but of course I got an error:
>>>>>>>>>
>>>>>>>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>>>>>>>>>
>LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>>>>>>>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>>>>>>>>>
>LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>>>>>>>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>>>>>>>>>
>>>>>>>
>>>>>>>
>LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>>>>>>>>
>>>>>>>>> The reason being as I want to eventually have something like
>this:
>>>>>>>>>
>>>>>>>>> Dad Mum Child Group sumdad summum sumchild
>>>>>>>>> AA RR RA A 2 2 0
>>>>>>>>> AA RR RR A 2 2 1
>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>> RA AA RR B 0 5 5
>>>>>>>>> RR AA RR B 4 5 5
>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>> AA AA RA C 3 3 0
>>>>>>>>> AA AA RA C 3 3 0
>>>>>>>>> AA RR RA C 3 3 0
>>>>>>>>>
>>>>>>>>> That is, I would like to have the specific counts next to what
>I'm
>>>>>>>>> consecutively counting per group. So for Group A for dad
>there are
>>>>>>> 2
>>>>>>>>> AAs, there are two RRs for mum but only 1 AA or RR for the
>child
>>>>>>> and
>>>>>>>>> that is RR (so the 1 is next to the RR and not the RA).
>>>>>>>>>
>>>>>>>>> Can this be done?
>>>>>>>>>
>>>>>>>>> K.
>>>>>>>>>
>>>>>>>>> ______________________________________________
>>>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>code.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>---------------------------------------------------------------------------
>>>>>>>> Jeff Newmiller The ..... .....
> Go
>>>>>>> Live...
>>>>>>>> DCN:<[hidden email]> Basics: ##.#. ##.#. Live
>>>>>>> Go...
>>>>>>>> Live: OO#.. Dead: OO#..
>>>>>>> Playing
>>>>>>>> Research Engineer (Solar/Batteries O.O#. #.O#.
> with
>>>>>>>> /Software/Embedded Controllers) .OO#. .OO#.
>>>>>>> rocks...1k
>>>>>>>>
>>>>>>>
>>>>>>>
>---------------------------------------------------------------------------
>>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ________________________________
>> If you reply to this email, your message will be added to the
>discussion
>> below:
>>
>http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701316.html
>> To unsubscribe from rle with data.table - is it possible?, click
>here.
>> NAML
>
>
>
>
>--
>View this message in context:
>http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701332.html
>Sent from the R help mailing list archive at Nabble.com.
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list