[R] sorting a data.frame using a vector
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Sat Nov 27 12:14:47 CET 2004
"Liaw, Andy" <andy_liaw at merck.com> writes:
> > > But I'm not happy with it because it is not really efficient. Any
> > > other suggestions are welcome!
> >
> > Anything wrong with x[y,] ???
>
> Well... sometimes:
>
> > nm <- as.character(sample(1:1e5))
> > x <- data.frame(x1=rnorm(1e5), row.names=1:1e5)
> > system.time(x[nm, , drop=FALSE], gcFirst=TRUE)
> [1] 155.13 0.01 156.10 NA NA
> > system.time(x2<-x[match(nm, rownames(x)), , drop=FALSE], gcFirst=TRUE)
> [1] 0.37 0.00 0.37 NA NA
> > all(rownames(x2) == nm)
> [1] TRUE
Yes, the internals are using
pmatch(i, rows, duplicates.ok = TRUE)
and pmatch() is a horrible lot slower than match(). Anyone for a spot
of hardcore optimization?
(Partial matching of character indices is a feature long regretted by
its inventors, but every time we consider killing it, we tend to recoil
in horror upon realizing what it would break...)
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list