[R] Doing a Task Without Using a For Loop

jim holtman jholtman at gmail.com
Wed Oct 15 14:14:44 CEST 2008


Run Rprof on your script that is updating the dataframe.  A dataframe
is a list and everytime you access something in the list it can be
expensive.  Rprof will probably show that a lot of time is spent in
the function "[[" which is accessing portions of the dataframe.
Vectors are much faster because they are typically sequentially in
memory and can be accessed easily.  Rprof is always helpful in
answering the question of "why is something taking so long".  It helps
you to find where the potential bottlenecks are.

On Wed, Oct 15, 2008 at 7:33 AM, Tom La Bone <booboo at gforcecable.com> wrote:
>
> I want to thank everyone for the help. I ended up having to use a loop to
> assign values from the table to NinYear. However, as I have played with the
> full datasets I have noticed that R is MUCH faster if I use vectors in the
> loop rather than columns of a dataframe. In the specific case of 43,000
> lines of data, assigning values from the table to the 43,000 elements of a
> vector took 6 seconds whereas assigning values from the table to 43,000
> elements of a dataframe took 21 minutes. Why is there such a huge
> difference?
>
> Tom
>
>
>
>
> Tom La Bone wrote:
>>
>> Assume that I have the dataframe "data1", which is listed at the end of
>> this message. I want count the number of lines that each person has for
>> each year. For example, the person with ID=213 has 15 entries (NinYear)
>> for 1953. The following bit of code calculates NinYear:
>>
>> for (i in 1:length(data1$ID)) {
>>   data1$NinYear[i] <- length(data1[data1$Year==data1$Year[i] &
>>     data1$ID==data1$ID[i],1]) }
>>
>> This seems to work but is horribly slow (some files I am working with have
>> over 500,000 lines). Can anyone suggest a faster way of doing this,
>> perhaps a way that does not use a for loop? Thanks.
>>
>> Tom
>>
>> ID    Year    NinYear
>> 209   1971    0
>> 209   1971    0
>> 213   1951    0
>> 213   1951    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1954    0
>> 213   1954    0
>> 213   1954    0
>> 213   1954    0
>> 213   1954    0
>> 213   1954    0
>> 213   1954    0
>> 213   1954    0
>> 213   1954    0
>> 213   1954    0
>> 213   1954    0
>> 213   1955    0
>> 213   1955    0
>> 234   1953    0
>> 234   1953    0
>> 234   1953    0
>> 234   1953    0
>> 234   1953    0
>> 234   1958    0
>> 234   1958    0
>> 234   1965    0
>> 234   1965    0
>> 234   1965    0
>> 249   1952    0
>> 249   1952    0
>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Doing-a-Task-Without-Using-a-For-Loop-tp19974078p19991682.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list