[Rd] Very slow subsetting by name
Hervé Pagès
hpages at fhcrc.org
Thu Jul 15 10:12:10 CEST 2010
Hi,
I'm subsetting a named vector using character indices.
My vector of indices (or keys) is 10x longer than the vector
I'm subsetting. All my keys are distinct and only 10% of them
are valid (i.e. match a name of the vector being subsetted).
It is surprisingly slow:
x1 <- 1:1000
names(x1) <- paste("a", x1, sep="")
keys <- sample(c(names(x1), paste("b", 1:9000, sep="")))
> system.time(y1 <- x1[keys])
user system elapsed
0.410 0.000 0.416
x2 <- 1:2000
names(x2) <- paste("a", x2, sep="")
keys <- sample(c(names(x2), paste("b", 1:18000, sep="")))
> system.time(y2 <- x2[keys])
user system elapsed
1.730 0.000 1.736
x3 <- 1:4000
names(x3) <- paste("a", x3, sep="")
keys <- sample(c(names(x3), paste("b", 1:36000, sep="")))
> system.time(y3 <- x3[keys])
user system elapsed
8.900 0.010 9.227
x4 <- 1:8000
names(x4) <- paste("a", x4, sep="")
keys <- sample(c(names(x4), paste("b", 1:72000, sep="")))
> system.time(y4 <- x4[keys])
user system elapsed
130.390 0.000 132.316
And it's apparently worse than quadratic in time!
I'm wondering why this subsetting by name is so slow since it
seems it could be implemented with x4[match(keys, names(x4))],
which is very fast: only 0.012s!
This is with R-2.11.0 and R-2.12.0.
Thanks,
H.
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the R-devel
mailing list