[R] A More efficient method?
Gabor Grothendieck
ggrothendieck at gmail.com
Wed Jul 4 18:49:53 CEST 2007
This was in error since s3 was not set. The as.numeric in the calculation
of s3 can be omitted if its ok to have an integer rather than numeric result
and in that case its still faster yet.
> set.seed(1)
> C <- sample(c("a", "b"), 1000000, replace = TRUE)
> system.time({
+ s0 <- vector(length = length(C))
+ for(i in seq_along(C)) s0[i] <- if (C[i] == "a") 1 else -1
+ s0
+ })
user system elapsed
21.32 0.02 26.10
> system.time(s1 <- ifelse(C == "a", 1, -1))
user system elapsed
2.37 0.26 2.64
> system.time(s2 <- 2 * (C == "a") - 1)
user system elapsed
0.32 0.02 0.35
> system.time({tmp <- C == "a"; s3 <- as.numeric(tmp - !tmp)})
user system elapsed
0.28 0.02 0.31
> identical(s0, s1)
[1] TRUE
> identical(s0, s2)
[1] TRUE
> identical(s0, s3)
[1] TRUE
>
On 7/4/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> In thinking about this a bit more I have found a slightly faster one still.
> See s3. Also I have added s0, the original solution, to the timings.
>
> > set.seed(1)
> > C <- sample(c("a", "b"), 1000000, replace = TRUE)
> > system.time({
> + s0 <- vector(length = length(C))
> + for(i in seq_along(C)) s0[i] <- if (C[i] == "a") 1 else -1
> + s0
> + })
> user system elapsed
> 21.75 0.02 25.99
> > system.time(s1 <- ifelse(C == "a", 1, -1))
> user system elapsed
> 2.32 0.17 2.54
> > system.time(s2 <- 2 * (C == "a") - 1)
> user system elapsed
> 0.29 0.02 0.32
> > system.time({tmp <- C == "a"; tmp - !tmp})
> user system elapsed
> 0.21 0.00 0.21
> > identical(s0, s1)
> [1] TRUE
> > identical(s0, s2)
> [1] TRUE
> > identical(s0, s3)
> [1] TRUE
>
> On 7/4/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> > Here are two ways. The second way is more than 10x faster.
> >
> > > set.seed(1)
> > > C <- sample(c("a", "b"), 100000, replace = TRUE)
> > > system.time(s1 <- ifelse(C == "a", 1, -1))
> > user system elapsed
> > 0.37 0.01 0.38
> > > system.time(s2 <- 2 * (C == "a") - 1)
> > user system elapsed
> > 0.02 0.00 0.02
> > > identical(s1, s2)
> > [1] TRUE
> >
> > On 7/4/07, Keith Alan Chamberlain <Keith.Chamberlain at colorado.edu> wrote:
> > > Dear Rhelpers,
> > >
> > > Is there a faster way than below to set a vector based on values from
> > > another vector? I'd like to call a pre-existing function for this, but one
> > > which can also handle an arbitrarily large number of categories. Any ideas?
> > >
> > > Cat=c('a','a','a','b','b','b','a','a','b') # Categorical variable
> > > C1=vector(length=length(Cat)) # New vector for numeric values
> > >
> > > # Cycle through each column and set C1 to corresponding value of Cat.
> > > for(i in 1:length(C1)){
> > > if(Cat[i]=='a') C1[i]=-1 else C1[i]=1
> > > }
> > >
> > > C1
> > > [1] -1 -1 -1 1 1 1 -1 -1 1
> > > Cat
> > > [1] "a" "a" "a" "b" "b" "b" "a" "a" "b"
> > >
> > > Sincerely,
> > > KeithC.
> > > Psych Undergrad, CU Boulder (US)
> > > RE McNair Scholar
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
>
More information about the R-help
mailing list