[R] Computing growth rate
Brijesh Mishra
brijeshkmishra at gmail.com
Thu Dec 15 13:28:23 CET 2016
Dear Mr. Barradas,
Thanks a lot for pointing that. I tried that in a few steps-
1. when I evaluated
d<-ddply(df1,"co_code1",transform, growth <- ifelse(diff(fyear1)==1,
(exp(diff(log(df1$sales1)))-1)*100, NA))
I got the following, i.e., I was not getting the growth column automatically.
co_code1 fyear1 sales1
1 1100 1990 1000
2 1100 1991 1100
3 1100 1992 1200
4 1100 1993 1300
5 1100 1995 1500
6 1100 1996 1600
7 1200 1991 1100
8 1200 1992 1200
9 1200 1993 1300
10 1200 1994 1400
11 1200 1995 1500
12 1200 1996 1600
13 1300 1990 1000
14 1300 1992 1200
15 1300 1993 1300
16 1300 1994 1400
17 1300 1995 1500
18 1300 1996 1600
2. When, just for the heck of it, the assign mark (<-) was changed to
'=' as done previously,
d<-ddply(df1,"co_code1",transform, growth = ifelse(diff(fyear1)==1,
(exp(diff(log(df1$sales1)))-1)*100, NA))
It was no longer evaluated-error was
"Error in data.frame(list(co_code1 = c(1100, 1100, 1100, 1100, 1100, 1100 :
arguments imply differing number of rows: 6, 5"
3. The following gives the desired result
df1$growth<-c(NA, ifelse(diff(df1$fyear1)==1,
(exp(diff(log(df1$sales1)))-1)*100, NA))
But now I am no longer restricting each iteranation to
'co_code1'-hypothetically if one co_code1 is followed by another with
incremental 'fyear1' difference as 1, growth will be evaluated.
Is there a better and more elegant way of doing it?
Thanks and regards,
Brijesh
On Thu, Dec 15, 2016 at 5:02 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> Hello,
>
> That is a very common mistake. if() accepts only one TRUE/FALSE, for a
> vectorized version you need ?ifelse. Something like the following
> (untested).
>
> growth <- ifelse(diff(fyear1)==1, (exp(diff(log(df1$sales1)))-1)*100, NA)
>
> Hope this helps,
>
> Rui Barradas
>
>
> Em 15-12-2016 03:40, Brijesh Mishra escreveu:
>>
>> Hi,
>>
>> I am trying to calculate growth rate (say, sales, though it is to be
>> computed for many variables) in a panel data set. Problem is that I
>> have missing data for many firms for many years. To put it simply, I
>> have created this short dataframe (original df id much bigger)
>>
>> df1<-data.frame(co_code1=rep(c(1100, 1200, 1300), each=7),
>> fyear1=rep(1990:1996, 3), sales1=rep(seq(1000,1600, by=100),3))
>>
>> # this gives me
>> co_code1 fyear1 sales1
>> 1 1100 1990 1000
>> 2 1100 1991 1100
>> 3 1100 1992 1200
>> 4 1100 1993 1300
>> 5 1100 1994 1400
>> 6 1100 1995 1500
>> 7 1100 1996 1600
>> 8 1200 1990 1000
>> 9 1200 1991 1100
>> 10 1200 1992 1200
>> 11 1200 1993 1300
>> 12 1200 1994 1400
>> 13 1200 1995 1500
>> 14 1200 1996 1600
>> 15 1300 1990 1000
>> 16 1300 1991 1100
>> 17 1300 1992 1200
>> 18 1300 1993 1300
>> 19 1300 1994 1400
>> 20 1300 1995 1500
>> 21 1300 1996 1600
>>
>> # I am now removing a couple of rows
>> df1<-df1[-c(5, 8), ]
>> # the result is
>> co_code1 fyear1 sales1
>> 1 1100 1990 1000
>> 2 1100 1991 1100
>> 3 1100 1992 1200
>> 4 1100 1993 1300
>> 6 1100 1995 1500
>> 7 1100 1996 1600
>> 9 1200 1991 1100
>> 10 1200 1992 1200
>> 11 1200 1993 1300
>> 12 1200 1994 1400
>> 13 1200 1995 1500
>> 14 1200 1996 1600
>> 15 1300 1990 1000
>> 16 1300 1991 1100
>> 17 1300 1992 1200
>> 18 1300 1993 1300
>> 19 1300 1994 1400
>> 20 1300 1995 1500
>> 21 1300 1996 1600
>> # so 1994 for co_code1 1100 and 1990 for co_code1 1200 have been
>> removed. If I try,
>> d<-ddply(df1,"co_code1",transform,
>> growth=c(NA,exp(diff(log(sales1)))-1)*100)
>>
>> # this apparently gives wrong results for the year 1995 (as shown
>> below) as growth rates are computed considering yearly increment.
>>
>> co_code1 fyear1 sales1 growth
>> 1 1100 1990 1000 NA
>> 2 1100 1991 1100 10.000000
>> 3 1100 1992 1200 9.090909
>> 4 1100 1993 1300 8.333333
>> 5 1100 1995 1500 15.384615
>> 6 1100 1996 1600 6.666667
>> 7 1200 1991 1100 NA
>> 8 1200 1992 1200 9.090909
>> 9 1200 1993 1300 8.333333
>> 10 1200 1994 1400 7.692308
>> 11 1200 1995 1500 7.142857
>> 12 1200 1996 1600 6.666667
>> 13 1300 1990 1000 NA
>> 14 1300 1991 1100 10.000000
>> 15 1300 1992 1200 9.090909
>> 16 1300 1993 1300 8.333333
>> 17 1300 1994 1400 7.692308
>> 18 1300 1995 1500 7.142857
>> 19 1300 1996 1600 6.666667
>> # I thought of using the formula only when the increment of fyear1 is
>> only 1 while in a co_code1, by using this formula
>>
>> d<-ddply(df1,
>> "co_code1",
>> transform,
>> if(diff(fyear1)==1){
>> growth=(exp(diff(log(df1$sales1)))-1)*100
>> } else{
>> growth=NA
>> })
>>
>> But, this doesn't work. I am getting the following error.
>>
>> In if (diff(fyear1) == 1) { :
>> the condition has length > 1 and only the first element will be used
>> (repeated a few times).
>>
>> # I have searched for a solution, but somehow couldn't get one. Hope
>> that some kind soul will guide me here.
>>
>> Regards,
>>
>> Brijesh K Mishra
>> Indian Institute of Management, Indore
>> India
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
More information about the R-help
mailing list