[R] avoiding timconsuming for loop renaming identifiers

Benilton Carvalho bcarvalh at jhsph.edu
Sat Jul 21 03:55:20 CEST 2007


as.integer(factor(dta[["school_id"]]))

b

On Jul 20, 2007, at 9:26 PM, toby909 at gmail.com wrote:

> Hi All
>
> I was wondering if I can avoid a time-consuming for loop on my  
> 600000 obs dataset.
>
> school_id   y
> 8           9.87
> 8           8.89
> 8           7.89
> 8           8.88
> 20          6.78
> 20          9.99
> 20          8.79
> 31          10.1
> 31          11
>
> There are, say, 143 different schools in this 600000 obs dataset.
>
> I need to thave sequential identifiers, 1,2,3,4,5,...,143.
>
> I was using an awkward for look that took 30 minutes to run.
> sid = 1
> dta$sid[1] = 1
> for (i in 2:nrow(dta)) {
> if (dta$school_id[i] != dta$school_[i-1]) sid = sid+1
> dta$sid[i] = sid
> }
>
> Any hints appreciated.
>
> Thanks Toby
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list