[R] rle with data.table - is it possible?

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Wed Dec 31 09:45:03 CET 2014


I do not understand the value of using the rle function in your 
description, but the code below appears to produce the table you want.

Note that better support for the data.table package might be found at 
stackexchange as the documentation specifies.

x <- read.table( text=
"Dad Mum Child Group
AA RR RA A
AA RR RR A
AA AA AA B
AA AA AA B
RA AA RR B
RR AA RR B
AA AA AA B
AA AA RA C
AA AA RA C
AA RR RA C
", header=TRUE, stringsAsFactors=FALSE )

library(data.table)
DT <- data.table( x )
DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
DT[ , sumdad := 0L ]
DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
DT[ , cdad := NULL ]
DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
DT[ , summum := 0L ]
DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
DT[ , cmum := NULL ]
DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
DT[ , sumchild := 0L ]
DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
DT[ , cchild := NULL ]

>DT
     Dad Mum Child Group sumdad summum sumchild
  1:  AA  RR    RA     A      2      2        0
  2:  AA  RR    RR     A      2      2        1
  3:  AA  AA    AA     B      4      5        5
  4:  AA  AA    AA     B      4      5        5
  5:  RA  AA    RR     B      0      5        5
  6:  RR  AA    RR     B      4      5        5
  7:  AA  AA    AA     B      4      5        5
  8:  AA  AA    RA     C      3      3        0
  9:  AA  AA    RA     C      3      3        0
10:  AA  RR    RA     C      3      3        0

On Tue, 30 Dec 2014, Kate Ignatius wrote:

> I'm trying to use both these packages and wondering whether they are possible...
>
> To make this simple, my ultimate goal is determine long stretches of
> 1s, but I want to do this within groups (hence using the data.table as
> I use the "set key" option.  However, I'm I'm not having much luck
> making this possible.
>
> For example, for simplistic sake, I have the following data:
>
> Dad Mum Child Group
> AA RR RA A
> AA RR RR A
> AA AA AA B
> AA AA AA B
> RA AA RR B
> RR AA RR B
> AA AA AA B
> AA AA RA C
> AA AA RA C
> AA RR RA  C
>
> And the following code which I know works
>
> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>
> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>
> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>
> However, I wish to do the above code by Group (though this file is
> millions of rows long and groups will be larger but just wanted to
> simply the example).
>
> I did something like this but of course I got an error:
>
> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
> LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>
> The reason being as I want to eventually have something like this:
>
> Dad Mum Child Group sumdad summum sumchild
> AA RR RA A 2 2 0
> AA RR RR A 2 2 1
> AA AA AA B 4 5 5
> AA AA AA B 4 5 5
> RA AA RR B 0 5 5
> RR AA RR B 4 5 5
> AA AA AA B 4 5 5
> AA AA RA C 3 3 0
> AA AA RA C 3 3 0
> AA RR RA  C 3 3 0
>
> That is, I would like to have the specific counts next to what I'm
> consecutively counting per group.  So for Group A for dad there are 2
> AAs,  there are two RRs for mum but only 1 AA or RR for the child and
> that is RR (so the 1 is next to the RR and not the RA).
>
> Can this be done?
>
> K.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k



More information about the R-help mailing list