Charles C. Berry
cberry at tajo.ucsd.edu
Tue Nov 23 20:12:14 CET 2010
On Tue, 23 Nov 2010, Seeliger.Curt at epamail.epa.gov wrote:
>>> Is there any similar function in R to the first. in SAS?
>> ?duplicated
>>
>> a$d <- ifelse( duplicated( a$a ), 0 , 1 )
>>
>> a$d.2 <- as.numeric( !duplicated( a$a ) )
>
> Actually, duplicated does not duplicate SAS' first. operator, though it
> may suffice for the OP's needs.
>
> To illustrate, let's start with a dataframe of 3 key columns and some data
> in x:
> tt <- data.frame(k1 = rep(1:3, each=10), k2 = rep(1:5, each=2, times=3),
> k3=rep(1:2, times=15), x = 1:30)
>
> # Try to mimic what the following SAS datastep would do,
> # assuming 'tt' is already sorted:
> # data foo;
> # set tt;
> # by k1, k2;
> # put first.k1=, first.k2=;
> # run;
>
> # SAS' first. operations would result in these values:
> tt$sas.first.k1 <- rep(c(1, rep(0,9)), 3)
> tt$sas.first.k2 <- rep(1:0, 15)
>
> # R duplicated() returns these values. You can see they
> # are the same for k1, but dissimilar after row 10 for k2.
> tt$duplicated.k1 <- 0+!duplicated(tt$k1)
> tt$duplicated.k2 <- 0+!duplicated(tt$k2)
It depends on how you use duplicated()
> all.equal( tt$sas.first.k2, 0+!duplicated( tt[, c("k1","k2") ] ) )
[1] TRUE
>
Chuck
>
> # I've found I need to lag a column to mimic SAS' first.
> # operator, thusly, though perhaps someone else knows
> # differently. Note this does not work on unordered
> # dataframes!
> lag.k1 <- c(NA, tt$k1[1:(nrow(tt) - 1)])
> tt$r.first.k1 <- ifelse(is.na(lag.k1), 1, tt$k1 != lag.k1)
>
> lag.k2 <- c(NA, tt$k2[1:(nrow(tt) - 1)])
> tt$r.first.k2 <- ifelse(is.na(lag.k2), 1, tt$k2 != lag.k2)
>
> Mimicking SAS' last. operation can be done in a similar manner, by
> anti-laging the column of interest and changing the comparisons somewhat.
>
>
>
