[R] Thoughts for faster indexing
William Dunlap
wdunlap at tibco.com
Thu Nov 21 17:48:44 CET 2013
> The line with the slow process (According to Rprof) is:
> j <- which( d$id == person )
> (I then process all the records indexed by j, which seems fast enough.)
Using split() once (and using its output in a loop) instead of == applied to
a long vector many times, as in
for(j in split(seq_along(d$id), people)) {
# newdata[j,] <- process(data[j,])
}
is typically faster. But this is the sort of thing that tapply() and the functions
in package:plyr do for you.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Noah Silverman
> Sent: Wednesday, November 20, 2013 12:17 PM
> To: 'R-help at r-project.org'
> Subject: [R] Thoughts for faster indexing
>
> Hello,
>
> I have a fairly large data.frame. (About 150,000 rows of 100
> variables.) There are case IDs, and multiple entries for each ID, with a
> date stamp. (i.e. records of peoples activity.)
>
>
> I need to iterate over each person (record ID) in the data set, and then
> process their data for each date. The processing part is fast, the date
> part is fast. Locating the records is slow. I've even tried using
> data.table, with ID set as the index, and it is still slow.
>
> The line with the slow process (According to Rprof) is:
>
>
> j <- which( d$id == person )
>
> (I then process all the records indexed by j, which seems fast enough.)
>
> where d is my data.frame or data.table
>
> I thought that using the data.table indexing would speed things up, but
> not in this case.
>
> Any ideas on how to speed this up?
>
>
> Thanks!
>
> --
> Noah Silverman, M.S., C.Phil
> UCLA Department of Statistics
> 8117 Math Sciences Building
> Los Angeles, CA 90095
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list