[R] How to generate a conditional dummy in R?

Tue May 29 23:34:12 CEST 2018

Hi Faradj,
Yes, the function expects at least three values for each country. Glad
it worked.

Jim

On Tue, May 29, 2018 at 10:53 PM, Faradj Koliev <faradj.g using gmail.com> wrote:
> Dear Jim,
>
> wow! It worked! Thanks a lot.
>
> I did as you suggested and it worked well with the real data. Although it
> gave me this error: Error in if (!is.na(x$Y[i])) { : argument is of length
> zero. For some reason the X1 produced less observations than it is in the
> data. But it's not a big deal - I identified those cases and simply deleted
> from the data (it was countries that only appeared twice in the data (e.g.
> USSR Yugoslavia etc).
>
> Best,
> Faradj
>
>
> 29 maj 2018 kl. 02:15 skrev Jim Lemon <drjimlemon using gmail.com>:
>
> Hi Faradj,
> What a problem! I think I have worked it out, but only because the
> result is the one you said you wanted.
>
> # the sample data frame is named fkdf
> Y2Xby3<-function(x) {
> nrows<-dim(x)[1]
> X<-rep(0,nrows)
> for(i in 1:(nrows-2)) {
>  if(!is.na(x$Y[i])) {
>   if(x$Y[i] == 1 && any(is.na(x$Y[(i+1):(i+2)]))) X[i]<-1
>   if(i > 1) {
>    if(X[i-1] == 1) X[i]<-0
>   }
>  }
>  else {
>   if(!is.na(x$Y[i+1])) {
>    if(x$Y[i+1] == 1 && is.na(x$Y[i+2]) && X[i] == 0)
>     X[i+1]<-1
>   }
>  }
> }
> return(X)
> }
> countries<-as.character(unique(fkdf$country))
> X1<-NULL
> for(country in countries)
> X1<-c(X1,Y2Xby3(fkdf[fkdf$country == country,]))
> X1
>  [1] 1 0 0 1 0 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0
> 0
> [38] 1 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 1 0 1 0 1
> 0
> [75] 1 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0
>
> fkdf$X
>
>  [1] 1 0 0 1 0 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0
> 0
> [38] 1 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 1 0 1 0 1
> 0
> [75] 1 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0
>
> Jim
>
> On Mon, May 28, 2018 at 8:43 PM, Faradj Koliev <faradj.g using gmail.com> wrote:
>
> Hi everyone,
>
> I am trying to generate a conditional dummy variable ”X" with the following
> rules
>
> set X=1 if Y is =1, two years prior to the NA.  [0,0,NA].
>
> For example, if  the pattern for Y is 0,0,NA then the X variable is =0 for
> all  the two years prior to the NA. If the pattern for Y is 0,1,NA or 1,0,NA
> then the X =1 . To be clear, if 1,1,NA then the X=1 that  first specific
> year, it should only count once (X=1), not twice.
>
> The code that I have now is not complete and I would appreciate some advice
> here. This is the code:
> dat2 <- dat1 %>%
>  group_by(country) %>%
>  group_by(grp = cumsum(is.na(lag(Y))), add = TRUE) %>%
>  mutate(first_year_at_1 = match(1, Y) * any(is.na(Y)) * any(tail(Y, 3) ==
> 1L),
>         X = {x <- integer(length(Y)) ; x[first_year_at_1] <- 1L ; x}) %>%
>  ungroup()
>
> It doesn’t really generate what I described above. Any help here would be
> much appreciated.
>
> Below you can see my sample data with the desired outcome ”X” dummy in it.
>
> Thank you!
>
> dput(data)
>
> structure(list(year = c(1991L, 1992L, 1993L, 1994L, 1995L, 1996L,
> 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L,
> 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L,
> 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L,
> 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L,
> 2011L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L,
> 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L,
> 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 1993L,
> 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L,
> 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L,
> 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L,
> 1999L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L,
> 2007L, 2008L, 2009L, 2010L, 2011L), country = structure(c(1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L,
> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
> 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("Canada",
> "Cuba", "Dominican Republic", "Haiti", "Jamaica"), class = "factor"),
>    Y = c(1L, NA, 1L, 1L, 1L, NA, 1L, NA, 1L, NA, 1L, NA, 1L,
>    1L, NA, 1L, NA, 1L, NA, 1L, NA, NA, 1L, 1L, NA, NA, 1L, NA,
>    1L, NA, 1L, NA, 1L, 1L, 1L, 1L, NA, 1L, NA, 1L, NA, 1L, NA,
>    NA, 1L, NA, 1L, 0L, 0L, 0L, 1L, NA, 0L, 1L, 0L, 0L, 0L, 0L,
>    0L, 1L, NA, 0L, 1L, 1L, NA, 0L, 1L, NA, 1L, NA, 1L, NA, 1L,
>    NA, 1L, NA, 1L, 1L, 1L, 1L, NA, 1L, NA, 1L, NA, 1L, NA, 1L,
>    0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, NA, 0L, 1L, 1L, 1L,
>    NA, 1L, NA, 0L, 1L, 1L, NA), X = c(1L, 0L, 0L, 1L, 0L, 0L,
>    1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
>    0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L,
>    0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L,
>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L,
>    1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L,
>    1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>    1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L)), .Names =
> c("year",
> "country", "Y", "X"), class = "data.frame", row.names = c(NA,
> -110L))
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>