[R] rle with data.table - is it possible?
David Winsemius
dwinsemius at comcast.net
Fri Jan 2 08:32:44 CET 2015
> On Jan 1, 2015, at 5:07 PM, Kate Ignatius <kate.ignatius at gmail.com> wrote:
>
> Apologies - mix up of syntax all over the place, a habit of mine. The
> last line was in there because of code beforehand so it really doesn't
> need to be there. Here is the proper code I hope:
>
> childseg<-0
> x<-sumchild ==0
> span<-rle(x)$lengths[rle(x)$values==TRUE]
> childseg[x]<-rep(seq_along(span), times = span)
>
This remains not reproducible. We have no idea what sumchild might be and the code throws an error. My guess is that you are trying to get a result such as would be delivered by:
childseg <- sumchild[ sumchild != 0 ]
—
David.
>
> On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
> <jdnewmil at dcn.davis.ca.us> wrote:
>> Thank you for attempting to encode what you want using R syntax, but you are not really succeeding yet (too many errors). Perhaps another hand generated result would help? A new input data frame might or might not be needed to illustrate desired results.
>>
>> Your second and third lines are syntactically incorrect, and I don't understand what you hope to accomplish by assigning an empty string to a numeric in your last line.
>> ---------------------------------------------------------------------------
>> Jeff Newmiller The ..... ..... Go Live...
>> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
>> Live: OO#.. Dead: OO#.. Playing
>> Research Engineer (Solar/Batteries O.O#. #.O#. with
>> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
>> ---------------------------------------------------------------------------
>> Sent from my phone. Please excuse my brevity.
>>
>> On January 1, 2015 4:16:52 AM PST, Kate Ignatius <kate.ignatius at gmail.com> wrote:
>>> Is it possible to add the following code or similar in data.table:
>>>
>>> childseg<-0
>>> x:=sumchild <-0
>>> span<-rle(x)$lengths[rle(x)$values==TRUE
>>> childseg[x]<-rep(seq_along(span), times = span)
>>> childseg[childseg == 0]<-''
>>>
>>> I was hoping to do this code by Group for mum, dad and
>>> child. The problem I'm having is with the
>>> span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can
>>> be added to data.table.
>>>
>>> [Previous email had incorrect code]
>>>
>>> On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
>>> <jdnewmil at dcn.davis.ca.us> wrote:
>>>> I do not understand the value of using the rle function in your
>>> description,
>>>> but the code below appears to produce the table you want.
>>>>
>>>> Note that better support for the data.table package might be found at
>>>> stackexchange as the documentation specifies.
>>>>
>>>> x <- read.table( text=
>>>> "Dad Mum Child Group
>>>> AA RR RA A
>>>> AA RR RR A
>>>> AA AA AA B
>>>> AA AA AA B
>>>> RA AA RR B
>>>> RR AA RR B
>>>> AA AA AA B
>>>> AA AA RA C
>>>> AA AA RA C
>>>> AA RR RA C
>>>> ", header=TRUE, stringsAsFactors=FALSE )
>>>>
>>>> library(data.table)
>>>> DT <- data.table( x )
>>>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
>>>> DT[ , sumdad := 0L ]
>>>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
>>>> DT[ , cdad := NULL ]
>>>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
>>>> DT[ , summum := 0L ]
>>>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
>>>> DT[ , cmum := NULL ]
>>>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
>>>> DT[ , sumchild := 0L ]
>>>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
>>>> DT[ , cchild := NULL ]
>>>>
>>>>> DT
>>>>
>>>> Dad Mum Child Group sumdad summum sumchild
>>>> 1: AA RR RA A 2 2 0
>>>> 2: AA RR RR A 2 2 1
>>>> 3: AA AA AA B 4 5 5
>>>> 4: AA AA AA B 4 5 5
>>>> 5: RA AA RR B 0 5 5
>>>> 6: RR AA RR B 4 5 5
>>>> 7: AA AA AA B 4 5 5
>>>> 8: AA AA RA C 3 3 0
>>>> 9: AA AA RA C 3 3 0
>>>> 10: AA RR RA C 3 3 0
>>>>
>>>>
>>>> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>>>>
>>>>> I'm trying to use both these packages and wondering whether they are
>>>>> possible...
>>>>>
>>>>> To make this simple, my ultimate goal is determine long stretches of
>>>>> 1s, but I want to do this within groups (hence using the data.table
>>> as
>>>>> I use the "set key" option. However, I'm I'm not having much luck
>>>>> making this possible.
>>>>>
>>>>> For example, for simplistic sake, I have the following data:
>>>>>
>>>>> Dad Mum Child Group
>>>>> AA RR RA A
>>>>> AA RR RR A
>>>>> AA AA AA B
>>>>> AA AA AA B
>>>>> RA AA RR B
>>>>> RR AA RR B
>>>>> AA AA AA B
>>>>> AA AA RA C
>>>>> AA AA RA C
>>>>> AA RR RA C
>>>>>
>>>>> And the following code which I know works
>>>>>
>>>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>>>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>>>>
>>>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>>>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>>>>
>>>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>>>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>>>>
>>>>> However, I wish to do the above code by Group (though this file is
>>>>> millions of rows long and groups will be larger but just wanted to
>>>>> simply the example).
>>>>>
>>>>> I did something like this but of course I got an error:
>>>>>
>>>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>>>>> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>>>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>>>>> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>>>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>>>>>
>>> LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>>>>
>>>>> The reason being as I want to eventually have something like this:
>>>>>
>>>>> Dad Mum Child Group sumdad summum sumchild
>>>>> AA RR RA A 2 2 0
>>>>> AA RR RR A 2 2 1
>>>>> AA AA AA B 4 5 5
>>>>> AA AA AA B 4 5 5
>>>>> RA AA RR B 0 5 5
>>>>> RR AA RR B 4 5 5
>>>>> AA AA AA B 4 5 5
>>>>> AA AA RA C 3 3 0
>>>>> AA AA RA C 3 3 0
>>>>> AA RR RA C 3 3 0
>>>>>
>>>>> That is, I would like to have the specific counts next to what I'm
>>>>> consecutively counting per group. So for Group A for dad there are
>>> 2
>>>>> AAs, there are two RRs for mum but only 1 AA or RR for the child
>>> and
>>>>> that is RR (so the 1 is next to the RR and not the RA).
>>>>>
>>>>> Can this be done?
>>>>>
>>>>> K.
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------------
>>>> Jeff Newmiller The ..... ..... Go
>>> Live...
>>>> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
>>> Go...
>>>> Live: OO#.. Dead: OO#..
>>> Playing
>>>> Research Engineer (Solar/Batteries O.O#. #.O#. with
>>>> /Software/Embedded Controllers) .OO#. .OO#.
>>> rocks...1k
>>>>
>>> ---------------------------------------------------------------------------
>>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list