[R] Manipulating groups of boolean data subject to group size and distance from other groups
Morway, Eric
emorway at usgs.gov
Mon Nov 28 18:38:02 CET 2016
The example below is a pared-down version of a much larger dataset. My
goal is to use the binary data contained in DF$col2 to guide manipulation
of the binary data itself, subject to the following:
- Groups of '1' that are separated from other, larger groups of "1's" in
'col2' by 2 or more years should be converted to "0"
- Groups of '1' need to be at least 2 consecutive years to be preserved
So in the example provided below, DF$col2 would be manipulated such that
its values are overrided to:
c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,1,1,1,1,1,1,1,1)
That is, the first group of 1's in positions 2 through 6 are separated from
other groups of 1's by 2 (or more) years, and the second group of 1's
(positions 11 & 12) span only a single year and do not meet the criteria of
being at least 2 years long.
The example R script below shows a small example I'm working with, called
"DF". The code that comes after the first line is my attempt to go through
some R-gymnastics to append a column to DF called "isl2" that reflects the
number of consecutive years in the 0/1 groups, where the +/- sign acts as
(or denotes) the original binary condition: 0 = negative, 1 = positive.
However, I'm stuck with how to proceed further. Could someone please help
me come up with script that modifies DF$col2 shown below to be like that
shown above?
DF <- data.frame(col1=rep(1991:2004,
each=2),col2=c(0,0,1,1,1,1,0,0,0,0,1,1,0,0,1,1,1,1,0,0,1,1,1,1,1,1,1,1))
DF$inc <- c(0, abs(diff(DF$col2)))
DF$cum <- cumsum(DF$inc)
ex1 <- aggregate(col1 ~ cum, data=DF, function(x) length(unique(x)))
names(ex1) <- c('cum','isl')
tmp1a <- merge(DF, ex1, by="cum", all.x=TRUE)
tmp1a$isl2 <- (-1*tmp1a$col2) * tmp1a$isl
tmp1a$isl2[tmp1a$isl2==0] <- tmp1a$isl[tmp1a$isl2==0]
DF$grpng <- tmp1a$isl2
At this point I was thinking I could use DF$grpng to sweep through col2 and
make adjustments, but I didn't know how to proceed.
For debugging purposes, a slightly different example would go from:
DF <- data.frame(col1=rep(1991:2004, each=2),col2=c(1,1,1,1,
1,1,0,0,0,0,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1))
to 'col2' looking like:
c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1)
That is, even though the first group of 1's is greater than two consecutive
years, it is separated from a larger group of 1's by 2 (or more years).
[[alternative HTML version deleted]]
More information about the R-help
mailing list