[R] Thoughts for faster indexing

Thu Nov 21 15:34:03 CET 2013

.... or use tapply(seq_len(nrow(d),d$id,...) or a wrapper version
thereof (by, aggregate,...)

However, it would not surprise me if this does not help. I suspect
that the problem is not what you think but in the code and context you
omitted, as others have already noted.

-- Bert

On Thu, Nov 21, 2013 at 5:51 AM, Rainer M Krug <Rainer at krugs.de> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>
> On 11/21/13, 12:34 , Jim Holtman wrote:
>> you need to show the statement in context with the rest of the
>> script.  you need to tell us what you want to do, not how you want
>> to do it.
>
> Agreed - a few details will result in guesses (see my guess below)
>
>>
>> Sent from my iPad
>>
>> On Nov 20, 2013, at 15:16, Noah Silverman
>> <noahsilverman at g.ucla.edu> wrote:
>>
>>> Hello,
>>>
>>> I have a fairly large data.frame.  (About 150,000 rows of 100
>>> variables.) There are case IDs, and multiple entries for each ID,
>>> with a date stamp.  (i.e. records of peoples activity.)
>>>
>>>
>>> I need to iterate over each person (record ID) in the data set,
>>> and then process their data for each date.  The processing part
>>> is fast, the date part is fast.  Locating the records is slow.
>>> I've even tried using data.table, with ID set as the index, and
>>> it is still slow.
>>>
>>> The line with the slow process (According to Rprof) is:
>>>
>>>
>>> j <- which( d$id == person )
>
> Possibly use
>
> d_by_id <- split(d, d$id)
>
> which splits the data.frame d into a listt, where each list represents
> the data.frame of one id.
>
> But: Just a guess.
>
> Cheers,
>
> Rainer
>
>>>
>>> (I then process all the records indexed by j, which seems fast
>>> enough.)
>>>
>>> where d is my data.frame or data.table
>>>
>>> I thought that using the data.table indexing would speed things
>>> up, but not in this case.
>>>
>>> Any ideas on how to speed this up?
>>>
>>>
>>> Thanks!
>>>
>>> -- Noah Silverman, M.S., C.Phil UCLA Department of Statistics
>>> 8117 Math Sciences Building Los Angeles, CA 90095
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
>>> posting guide http://www.R-project.org/posting-guide.html and
>>> provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________ R-help at r-project.org
>> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>> read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> - --
> Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
> Biology, UCT), Dipl. Phys. (Germany)
>
> Centre of Excellence for Invasion Biology
> Stellenbosch University
> South Africa
>
> Tel :       +33 - (0)9 53 10 27 44
> Cell:       +33 - (0)6 85 62 59 98
> Fax :       +33 - (0)9 58 10 27 44
>
> Fax (D):    +49 - (0)3 21 21 25 22 44
>
> email:      Rainer at krugs.de
>
> Skype:      RMkrug
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQEcBAEBAgAGBQJSjg/+AAoJENvXNx4PUvmC/kcH/3eaMvOTCAvA6ewwH/XJHH6X
> B4BgstscvvJ3yArFeWqLIV0CEgk3da4c28+4Jk50vnltRVwUieFxKA1UK6Ef3gPl
> pZUg9TaUNAeHPfkrxQSrYIa+hLWMZ1Ybe6GM1OlXnkc9ZBT9KS+rX3HFfr9rdyFI
> Rv7SgrylUnpZIyiMeAzQS/FBzozV3G6mGu8FJ8YW5mHCqajI2alK3B3BBREzuLsL
> ZMSuFDPTzxrE63O+uU6yFDibhz/4chKVz6CEF52WUgpgP+X4rW/DcLDrDfXxEvwM
> ZDHcOZ8FJsuDl1lb1bdzSyS61KfzWls37i9VtOozqQwSFbaHbcdV16jHCPDzRPA=
> =u5ol
> -----END PGP SIGNATURE-----
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 

Bert Gunter
Genentech Nonclinical Biostatistics

(650) 467-7374