[R] rle with data.table - is it possible?
Kate Ignatius
kate.ignatius at gmail.com
Tue Dec 30 15:27:37 CET 2014
I'm trying to use both these packages and wondering whether they are possible...
To make this simple, my ultimate goal is determine long stretches of
1s, but I want to do this within groups (hence using the data.table as
I use the "set key" option. However, I'm I'm not having much luck
making this possible.
For example, for simplistic sake, I have the following data:
Dad Mum Child Group
AA RR RA A
AA RR RR A
AA AA AA B
AA AA AA B
RA AA RR B
RR AA RR B
AA AA AA B
AA AA RA C
AA AA RA C
AA RR RA C
And the following code which I know works
hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
However, I wish to do the above code by Group (though this file is
millions of rows long and groups will be larger but just wanted to
simply the example).
I did something like this but of course I got an error:
LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
The reason being as I want to eventually have something like this:
Dad Mum Child Group sumdad summum sumchild
AA RR RA A 2 2 0
AA RR RR A 2 2 1
AA AA AA B 4 5 5
AA AA AA B 4 5 5
RA AA RR B 0 5 5
RR AA RR B 4 5 5
AA AA AA B 4 5 5
AA AA RA C 3 3 0
AA AA RA C 3 3 0
AA RR RA C 3 3 0
That is, I would like to have the specific counts next to what I'm
consecutively counting per group. So for Group A for dad there are 2
AAs, there are two RRs for mum but only 1 AA or RR for the child and
that is RR (so the 1 is next to the RR and not the RA).
Can this be done?
K.
More information about the R-help
mailing list