[R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

Emmanuel Levy emmanuel.levy at gmail.com
Wed Aug 13 01:35:22 CEST 2008


Dear All,

I have a large data frame ( 2700000 lines and 14 columns), and I would like to
extract the information in a particular way illustrated below:


Given a data frame "df":

> col1=sample(c(0,1),10, rep=T)
> names = factor(c(rep("A",5),rep("B",5)))
> df = data.frame(names,col1)
> df
   names col1
1      A    1
2      A    0
3      A    1
4      A    0
5      A    1
6      B    0
7      B    0
8      B    1
9      B    0
10     B    0

I would like to tranform it in the form:

> index = c("A","B")
> col1[[1]]=df$col1[which(df$name=="A")]
> col1[[2]]=df$col1[which(df$name=="B")]

My problem is that the command:  *** which(df$name=="A") ***
takes about 1 second because df is so big.

I was thinking that a "level" could maybe be accessed instantly but I am not
sure about how to do it.

I would be very grateful for any advice that would allow me to speed this up.

Best wishes,

Emmanuel



More information about the R-help mailing list