[R] Doing a Task Without Using a For Loop
jim holtman
jholtman at gmail.com
Wed Oct 15 14:14:44 CEST 2008
Run Rprof on your script that is updating the dataframe. A dataframe
is a list and everytime you access something in the list it can be
expensive. Rprof will probably show that a lot of time is spent in
the function "[[" which is accessing portions of the dataframe.
Vectors are much faster because they are typically sequentially in
memory and can be accessed easily. Rprof is always helpful in
answering the question of "why is something taking so long". It helps
you to find where the potential bottlenecks are.
On Wed, Oct 15, 2008 at 7:33 AM, Tom La Bone <booboo at gforcecable.com> wrote:
>
> I want to thank everyone for the help. I ended up having to use a loop to
> assign values from the table to NinYear. However, as I have played with the
> full datasets I have noticed that R is MUCH faster if I use vectors in the
> loop rather than columns of a dataframe. In the specific case of 43,000
> lines of data, assigning values from the table to the 43,000 elements of a
> vector took 6 seconds whereas assigning values from the table to 43,000
> elements of a dataframe took 21 minutes. Why is there such a huge
> difference?
>
> Tom
>
>
>
>
> Tom La Bone wrote:
>>
>> Assume that I have the dataframe "data1", which is listed at the end of
>> this message. I want count the number of lines that each person has for
>> each year. For example, the person with ID=213 has 15 entries (NinYear)
>> for 1953. The following bit of code calculates NinYear:
>>
>> for (i in 1:length(data1$ID)) {
>> data1$NinYear[i] <- length(data1[data1$Year==data1$Year[i] &
>> data1$ID==data1$ID[i],1]) }
>>
>> This seems to work but is horribly slow (some files I am working with have
>> over 500,000 lines). Can anyone suggest a faster way of doing this,
>> perhaps a way that does not use a for loop? Thanks.
>>
>> Tom
>>
>> ID Year NinYear
>> 209 1971 0
>> 209 1971 0
>> 213 1951 0
>> 213 1951 0
>> 213 1953 0
>> 213 1953 0
>> 213 1953 0
>> 213 1953 0
>> 213 1953 0
>> 213 1953 0
>> 213 1953 0
>> 213 1953 0
>> 213 1953 0
>> 213 1953 0
>> 213 1953 0
>> 213 1953 0
>> 213 1953 0
>> 213 1953 0
>> 213 1953 0
>> 213 1954 0
>> 213 1954 0
>> 213 1954 0
>> 213 1954 0
>> 213 1954 0
>> 213 1954 0
>> 213 1954 0
>> 213 1954 0
>> 213 1954 0
>> 213 1954 0
>> 213 1954 0
>> 213 1955 0
>> 213 1955 0
>> 234 1953 0
>> 234 1953 0
>> 234 1953 0
>> 234 1953 0
>> 234 1953 0
>> 234 1958 0
>> 234 1958 0
>> 234 1965 0
>> 234 1965 0
>> 234 1965 0
>> 249 1952 0
>> 249 1952 0
>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Doing-a-Task-Without-Using-a-For-Loop-tp19974078p19991682.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list