[R] the first. from SAS in R

William Dunlap wdunlap at tibco.com
Tue Nov 23 22:00:22 CET 2010


I often use code like Curt's encapsulated in the
following isFirstInRun function:

  isFirstInRun <- function(x,...) {
      lengthX <- length(x)
      if (lengthX == 0) return(logical(0))
      retVal <- c(TRUE, x[-1]!=x[-lengthX])
      for(arg in list(...)) {
          stopifnot(lengthX == length(arg))
          retVal <- retVal | c(TRUE, arg[-1]!=arg[-lengthX])
      }
      if (any(missing<-is.na(retVal))) # match rle: NA!=NA
          retVal[missing] <- TRUE
      retVal
  }
E.g.,
> d <- data.frame(log=trunc(log(1:10)), sqrt=trunc(sqrt(1:10)))
> within(d, first <- isFirstInRun(log))
   log sqrt first
1    0    1  TRUE
2    0    1 FALSE
3    1    1  TRUE
4    1    2 FALSE
5    1    2 FALSE
6    1    2 FALSE
7    1    2 FALSE
8    2    2  TRUE
9    2    3 FALSE
10   2    3 FALSE
> # Or look for change in any number of vectors:
> within(d, first <- isFirstInRun(log, sqrt)) # TRUE if either changes
   log sqrt first
1    0    1  TRUE
2    0    1 FALSE
3    1    1  TRUE
4    1    2  TRUE
5    1    2 FALSE
6    1    2 FALSE
7    1    2 FALSE
8    2    2  TRUE
9    2    3  TRUE
10   2    3 FALSE

To do isLastInRun put the TRUE after the x[-1]!=x[-length(x)]

isLastInRun <- function(x,...) {
    lengthX <- length(x)
    if (lengthX == 0) return(logical(0))
    retVal <- c(x[-1]!=x[-lengthX], TRUE)
    for(arg in list(...)) {
        stopifnot(lengthX == length(arg))
        retVal <- retVal | c(arg[-1]!=arg[-lengthX], TRUE)
    }
    if (any(missing<-is.na(retVal))) # match rle: NA!=NA
        retVal[missing] <- TRUE
    retVal
}


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of 
> Seeliger.Curt at epamail.epa.gov
> Sent: Tuesday, November 23, 2010 10:38 AM
> To: r-help at r-project.org
> Cc: Joel
> Subject: Re: [R] the first. from SAS in R
> 
> > > Is there any similar function in R to the first. in SAS?
> >     ?duplicated
> > 
> >     a$d <- ifelse( duplicated( a$a ), 0 , 1 )
> > 
> >     a$d.2 <- as.numeric( !duplicated( a$a ) )
> 
> Actually, duplicated does not duplicate SAS' first. operator, 
> though it 
> may suffice for the OP's needs.
> 
> To illustrate, let's start with a dataframe of 3 key columns 
> and some data 
> in x:
> tt <- data.frame(k1 = rep(1:3, each=10), k2 = rep(1:5, 
> each=2, times=3), 
> k3=rep(1:2, times=15), x = 1:30)
> 
> # Try to mimic what the following SAS datastep would do, 
> # assuming 'tt' is already sorted:
> #       data foo;
> #         set tt;
> #         by k1, k2;
> #         put first.k1=, first.k2=;
> #       run;
> 
> # SAS' first. operations would result in these values:
> tt$sas.first.k1 <- rep(c(1, rep(0,9)), 3)
> tt$sas.first.k2 <- rep(1:0, 15)
> 
> # R duplicated() returns these values.  You can see they 
> # are the same for k1, but dissimilar after row 10 for k2.
> tt$duplicated.k1 <- 0+!duplicated(tt$k1)
> tt$duplicated.k2 <- 0+!duplicated(tt$k2)
> 
> # I've found I need to lag a column to mimic SAS' first. 
> # operator, thusly, though perhaps someone else knows 
> # differently.  Note this does not work on unordered 
> # dataframes!
> lag.k1 <- c(NA, tt$k1[1:(nrow(tt) - 1)])
> tt$r.first.k1 <- ifelse(is.na(lag.k1), 1, tt$k1 != lag.k1)
> 
> lag.k2 <- c(NA, tt$k2[1:(nrow(tt) - 1)])
> tt$r.first.k2 <- ifelse(is.na(lag.k2), 1, tt$k2 != lag.k2)
> 
> Mimicking SAS' last. operation can be done in a similar manner, by 
> anti-laging the column of interest and changing the 
> comparisons somewhat.
> 
> Enjoy the days,
> cur
> -- 
> Curt Seeliger, Data Ranger
> Raytheon Information Services - Contractor to ORD
> seeliger.curt at epa.gov
> 541/754-4638
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list