[R] Computing growth rate

Thu Dec 15 13:35:48 CET 2016

This was ensured while using ddply()...

On Thu, Dec 15, 2016 at 6:04 PM, Brijesh Mishra
<brijeshkmishra at gmail.com> wrote:
> Dear Mr Hasselman,
>
> I missed you mail, while I was typing my own mail as a reply to Mr.
> Barradas suggestion. In fact, I implemented your suggestion even
> before reading it. But, I have a concern that I have noted (though its
> only hypothetical- such a scenario is very unlikely to occur). Is
> there a way to restrict such calculations co_code1 wise?
>
> Many thanks,
>
> Brijesh
>
> On Thu, Dec 15, 2016 at 5:48 PM, Berend Hasselman <bhh at xs4all.nl> wrote:
>>
>>> On 15 Dec 2016, at 04:40, Brijesh Mishra <brijeshkmishra at gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I am trying to calculate growth rate (say, sales, though it is to be
>>> computed for many variables) in a panel data set. Problem is that I
>>> have missing data for many firms for many years. To put it simply, I
>>> have created this short dataframe (original df id much bigger)
>>>
>>> df1<-data.frame(co_code1=rep(c(1100, 1200, 1300), each=7),
>>> fyear1=rep(1990:1996, 3), sales1=rep(seq(1000,1600, by=100),3))
>>>
>>> # this gives me
>>> co_code1 fyear1 sales1
>>> 1      1100   1990   1000
>>> 2      1100   1991   1100
>>> 3      1100   1992   1200
>>> 4      1100   1993   1300
>>> 5      1100   1994   1400
>>> 6      1100   1995   1500
>>> 7      1100   1996   1600
>>> 8      1200   1990   1000
>>> 9      1200   1991   1100
>>> 10     1200   1992   1200
>>> 11     1200   1993   1300
>>> 12     1200   1994   1400
>>> 13     1200   1995   1500
>>> 14     1200   1996   1600
>>> 15     1300   1990   1000
>>> 16     1300   1991   1100
>>> 17     1300   1992   1200
>>> 18     1300   1993   1300
>>> 19     1300   1994   1400
>>> 20     1300   1995   1500
>>> 21     1300   1996   1600
>>>
>>> # I am now removing a couple of rows
>>> df1<-df1[-c(5, 8), ]
>>> # the result is
>>>   co_code1 fyear1 sales1
>>> 1      1100   1990   1000
>>> 2      1100   1991   1100
>>> 3      1100   1992   1200
>>> 4      1100   1993   1300
>>> 6      1100   1995   1500
>>> 7      1100   1996   1600
>>> 9      1200   1991   1100
>>> 10     1200   1992   1200
>>> 11     1200   1993   1300
>>> 12     1200   1994   1400
>>> 13     1200   1995   1500
>>> 14     1200   1996   1600
>>> 15     1300   1990   1000
>>> 16     1300   1991   1100
>>> 17     1300   1992   1200
>>> 18     1300   1993   1300
>>> 19     1300   1994   1400
>>> 20     1300   1995   1500
>>> 21     1300   1996   1600
>>> # so 1994 for co_code1 1100 and 1990 for co_code1 1200 have been
>>> removed. If I try,
>>> d<-ddply(df1,"co_code1",transform, growth=c(NA,exp(diff(log(sales1)))-1)*100)
>>>
>>> # this apparently gives wrong results for the year 1995 (as shown
>>> below) as growth rates are computed considering yearly increment.
>>>
>>>   co_code1 fyear1 sales1    growth
>>> 1      1100   1990   1000        NA
>>> 2      1100   1991   1100 10.000000
>>> 3      1100   1992   1200  9.090909
>>> 4      1100   1993   1300  8.333333
>>> 5      1100   1995   1500 15.384615
>>> 6      1100   1996   1600  6.666667
>>> 7      1200   1991   1100        NA
>>> 8      1200   1992   1200  9.090909
>>> 9      1200   1993   1300  8.333333
>>> 10     1200   1994   1400  7.692308
>>> 11     1200   1995   1500  7.142857
>>> 12     1200   1996   1600  6.666667
>>> 13     1300   1990   1000        NA
>>> 14     1300   1991   1100 10.000000
>>> 15     1300   1992   1200  9.090909
>>> 16     1300   1993   1300  8.333333
>>> 17     1300   1994   1400  7.692308
>>> 18     1300   1995   1500  7.142857
>>> 19     1300   1996   1600  6.666667
>>> # I thought of using the formula only when the increment of fyear1 is
>>> only 1 while in a co_code1, by using this formula
>>>
>>> d<-ddply(df1,
>>>         "co_code1",
>>>         transform,
>>>         if(diff(fyear1)==1){
>>>           growth=(exp(diff(log(df1$sales1)))-1)*100
>>>         } else{
>>>           growth=NA
>>>         })
>>>
>>> But, this doesn't work. I am getting the following error.
>>>
>>> In if (diff(fyear1) == 1) { :
>>>  the condition has length > 1 and only the first element will be used
>>> (repeated a few times).
>>>
>>> # I have searched for a solution, but somehow couldn't get one. Hope
>>> that some kind soul will guide me here.
>>>
>>
>> In your case use ifelse() as explained by Rui.
>> But it can be done more easily since the fyear1 and co_code1 are synchronized.
>> Add a new column to df1 like this
>>
>> df1$growth <- c(NA,
>>          ifelse(diff(df1$fyear1)==1,
>>                     (exp(diff(log(df1$sales1)))-1)*100,
>>                     NA
>>                     )
>>         )
>>
>> and display df1. From your request I cannot determine if this is what you want.
>>
>> regards,
>>
>> Berend Hasselman
>>