[Rd] Ordering of values returned by unique
Tony Plate
tplate at blackmesacapital.com
Wed Sep 29 18:09:18 CEST 2004
AFAIK, it has always worked that way in S-plus and R. Furthermore, the
documentation in R for 'unique' says that it removes duplicated
elements. This does seem to leave the possibility that element other than
the first of a set of duplicates is retained, which could mess up the
order. However, the documentation for 'duplicated' is clearer: it says
that 'duplicated' identifies duplicates of earlier elements. Also in the
examples for 'duplicated', it says that x[!duplicated(x)] == unique(x)
(paraphrased).
I depend on this all the time, so I also checked some references. In the
Blue book the documentation for the functions unique and duplicated is
combined and implies the above. In MASS 4th Ed, the page referred to by
the index entry for 'unique' (p48, #9 in my copy) states that 'unique'
removes duplicates as identified by 'duplicated', which implies that the
order of retained elements is not changed. The Green book has no index
entry for 'unique'. In S-plus the implementation of unique.default(x) uses
x[!duplicated(x)].
So, I think the evidence is pretty strong that unique(x) will always return
elements in the same order as they first appear in x. But it would be nice
if the documentation for 'unique' explicitly stated that this is the
behavior for all methods. (It does state this for the array method for
'unique').
-- Tony Plate
At Wednesday 09:17 AM 9/29/2004, Witold Eryk Wolski wrote:
>Hi,
>
>Is the ordering of the values returned something on what I can rely on, a
>form of a standard, that a function called unique in R (in futher
>versions) will return the uniq elements in order of they first occurcence.
>
> > x<-c(2,2,1,2)
> > unique(x)
>[1] 2 1
>
>Its seems not to be the standard. E.g. matlab
> >> x=[2,2,1,2]
>x =
> 2 2 1 2
> >> unique(x)
>ans =
> 1 2
>
>I just noted it because, the way how it is working now is extremely
>usefull for some applications (e.g tree traversal), so i use it in a
>script. But I am a little woried if I can rely on this behaviour in
>further versions. And furthermore can I assume that someone reading the
>code will think that it works in that way?
>Or is it better to define a additional function?
>keeporderunique<-function(x)
>{
> res<-rep(NA,length(unique(x))
> count<-0
> for(i in x)
> {
> if(!i%in%res)
> {
> count<-count+1
> res[count]<-i
> }
> }
> res
>}
>
>/E
>
>
>
>--
>Dipl. bio-chem. Witold Eryk Wolski
>MPI-Moleculare Genetic
>Ihnestrasse 63-73 14195 Berlin _
>tel: 0049-30-83875219 'v'
>http://www.molgen.mpg.de/~wolski / \
>mail: witek96 at users.sourceforge.net ---W-W----
> wolski at molgen.mpg.de
>
>______________________________________________
>R-devel at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list