[Rd] suggesting a new feature for unique()

Liaw, Andy andy_liaw at merck.com
Thu Aug 19 21:00:40 CEST 2004


Dear R-devel,

May I suggest that a new feature be added to a couple of unique() methods?
Sometimes it's useful to have the indices of the original data that the
unique elements come from, so that the original data can be recreated from
the unique()ed data.  I suggest that an `index' argument be added for
unique.  Below is a suggested patch against
R/src/library/base/R/duplicated.R:

*** R-devel/src/library/base/R/duplicated.R	Tue Jul 20 12:45:52 2004
--- duplicated.R	Thu Aug 19 14:50:38 2004
***************
*** 34,50 ****
  
  ## NB unique.default is used by factor to avoid unique.matrix,
  ## so it needs to handle some other cases.
! unique.default <- function(x, incomparables = FALSE, ...)
  {
      if(!is.logical(incomparables) || incomparables)
  	.NotYetUsed("incomparables != FALSE")
      z <- .Internal(unique(x))
      if(is.factor(x))
! 	factor(z, levels = seq(len=nlevels(x)), labels = levels(x),
!                ordered = is.ordered(x))
      else if(inherits(x, "POSIXct") || inherits(x, "Date"))
!         structure(z, class=class(x))
!     else z
  }
  
  unique.data.frame <- function(x, incomparables = FALSE, ...)
--- 34,51 ----
  
  ## NB unique.default is used by factor to avoid unique.matrix,
  ## so it needs to handle some other cases.
! unique.default <- function(x, incomparables = FALSE, index=FALSE, ...)
  {
      if(!is.logical(incomparables) || incomparables)
  	.NotYetUsed("incomparables != FALSE")
      z <- .Internal(unique(x))
      if(is.factor(x))
! 	z <- factor(z, levels = seq(len=nlevels(x)), labels = levels(x),
!                     ordered = is.ordered(x))
      else if(inherits(x, "POSIXct") || inherits(x, "Date"))
!         z <- structure(z, class=class(x))
!     if (index) attr(z, "index") <- match(x, z)
!     z
  }
  
  unique.data.frame <- function(x, incomparables = FALSE, ...)
***************
*** 55,61 ****
  }
  
  unique.matrix <- unique.array <-
!     function(x, incomparables = FALSE , MARGIN = 1, ...)
  {
      if(!is.logical(incomparables) || incomparables)
  	.NotYetUsed("incomparables != FALSE")
--- 56,62 ----
  }
  
  unique.matrix <- unique.array <-
!     function(x, incomparables = FALSE , MARGIN = 1, index=FALSE, ...)
  {
      if(!is.logical(incomparables) || incomparables)
  	.NotYetUsed("incomparables != FALSE")
***************
*** 66,70 ****
      args <- rep(alist(a=), ndim)
      names(args) <- NULL
      args[[MARGIN]] <- !duplicated(as.vector(temp))
!     do.call("[", c(list(x=x), args, list(drop=FALSE)))
  }
--- 67,76 ----
      args <- rep(alist(a=), ndim)
      names(args) <- NULL
      args[[MARGIN]] <- !duplicated(as.vector(temp))
!     res <- do.call("[", c(list(x=x), args, list(drop=FALSE)))
!     if (index) {
!         resTemp <- apply(res, MARGIN, function(x) paste(x, collapse =
"\r"))
!         attr(res, "index") <- match(temp, resTemp)
!     }
!     res
  }

An example usage:

> x <- sample(5, 10, rep=T)
> x
 [1] 4 2 5 3 2 3 4 2 2 3
> z <- unique(x, index=TRUE)
> z[attr(z, "index")] == x
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> x <- factor(x)
> z <- unique(x, index=TRUE)
> z
[1] 4 2 5 3
Levels: 2 3 4 5
> z[attr(z, "index")] == x
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE


[I have not tried adding the same thing for the unique.data.frame method,
but that shouldn't be too hard...]

Best,
Andy

Andy Liaw, PhD
Biometrics Research      PO Box 2000, RY33-300     
Merck Research Labs           Rahway, NJ 07065
mailto:andy_liaw at merck.com        732-594-0820



More information about the R-devel mailing list