[R] Thoughts for faster indexing

Rainer M Krug Rainer at krugs.de
Thu Nov 21 14:51:58 CET 2013


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On 11/21/13, 12:34 , Jim Holtman wrote:
> you need to show the statement in context with the rest of the
> script.  you need to tell us what you want to do, not how you want
> to do it.

Agreed - a few details will result in guesses (see my guess below)

> 
> Sent from my iPad
> 
> On Nov 20, 2013, at 15:16, Noah Silverman
> <noahsilverman at g.ucla.edu> wrote:
> 
>> Hello,
>> 
>> I have a fairly large data.frame.  (About 150,000 rows of 100 
>> variables.) There are case IDs, and multiple entries for each ID,
>> with a date stamp.  (i.e. records of peoples activity.)
>> 
>> 
>> I need to iterate over each person (record ID) in the data set,
>> and then process their data for each date.  The processing part
>> is fast, the date part is fast.  Locating the records is slow.
>> I've even tried using data.table, with ID set as the index, and
>> it is still slow.
>> 
>> The line with the slow process (According to Rprof) is:
>> 
>> 
>> j <- which( d$id == person )

Possibly use

d_by_id <- split(d, d$id)

which splits the data.frame d into a listt, where each list represents
the data.frame of one id.

But: Just a guess.

Cheers,

Rainer

>> 
>> (I then process all the records indexed by j, which seems fast
>> enough.)
>> 
>> where d is my data.frame or data.table
>> 
>> I thought that using the data.table indexing would speed things
>> up, but not in this case.
>> 
>> Any ideas on how to speed this up?
>> 
>> 
>> Thanks!
>> 
>> -- Noah Silverman, M.S., C.Phil UCLA Department of Statistics 
>> 8117 Math Sciences Building Los Angeles, CA 90095
>> 
>> ______________________________________________ 
>> R-help at r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
>> posting guide http://www.R-project.org/posting-guide.html and
>> provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________ R-help at r-project.org
> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
> read the posting guide http://www.R-project.org/posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code.
> 

- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :       +33 - (0)9 53 10 27 44
Cell:       +33 - (0)6 85 62 59 98
Fax :       +33 - (0)9 58 10 27 44

Fax (D):    +49 - (0)3 21 21 25 22 44

email:      Rainer at krugs.de

Skype:      RMkrug
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJSjg/+AAoJENvXNx4PUvmC/kcH/3eaMvOTCAvA6ewwH/XJHH6X
B4BgstscvvJ3yArFeWqLIV0CEgk3da4c28+4Jk50vnltRVwUieFxKA1UK6Ef3gPl
pZUg9TaUNAeHPfkrxQSrYIa+hLWMZ1Ybe6GM1OlXnkc9ZBT9KS+rX3HFfr9rdyFI
Rv7SgrylUnpZIyiMeAzQS/FBzozV3G6mGu8FJ8YW5mHCqajI2alK3B3BBREzuLsL
ZMSuFDPTzxrE63O+uU6yFDibhz/4chKVz6CEF52WUgpgP+X4rW/DcLDrDfXxEvwM
ZDHcOZ8FJsuDl1lb1bdzSyS61KfzWls37i9VtOozqQwSFbaHbcdV16jHCPDzRPA=
=u5ol
-----END PGP SIGNATURE-----



More information about the R-help mailing list