[R] create a dummy variables for companies with complete history.

Mark Sharp msharp at txbiomed.org
Wed Jun 24 23:00:59 CEST 2015


Giacomo,

Please include some representative data. It is not clear why your offset of 4 (z$cod[i - 4]) is going to be an accurate surrogate for complete data.

Since I do not have your data set or its true structure I am having to guess.
# make 5 copies of 200 companies
companies <- paste0(rep(LETTERS[1:4], 5, each = 50), rep(1:50, 5))
companies <- companies[order(companies)]
years <- rep(1:5, 200)
z <- data.frame(cod = companies, year = years,
                revenue = round(rnorm(1000, mean = 100000, sd = 10000)))
# trim this down to the 728 rows you have by pulling out records at random
set.seed(1) # so that you can repeat these results
z <- z[sample.int(1000, 728), ]
z <- z[order(z$cod, z$year), ]

#No matter how you order these data, your offset approach will not tell you which companies have full records.
> head(z, 10)
   cod year revenue
1   A1    1  112192
2   A1    2  105840
4   A1    4  112357
5   A1    5   91772
7  A10    2  102601
8  A10    3  105183
11 A11    1  101269
12 A11    2  100719
14 A11    4   86138
15 A11    5  105044

#You can do something like the following.

counts <- table(z$cod)
complete <- names(counts[as.integer(counts) == 5])
# It is probably better to keep the dummy variable inside the dataframe.
z$complete <- ifelse(z$cod %in% complete, TRUE, FALSE)

> head(z, 20)
   cod year revenue complete
1   A1    1  112192    FALSE
2   A1    2  105840    FALSE
4   A1    4  112357    FALSE
5   A1    5   91772    FALSE
7  A10    2  102601    FALSE
8  A10    3  105183    FALSE
11 A11    1  101269    FALSE
12 A11    2  100719    FALSE
14 A11    4   86138    FALSE
15 A11    5  105044    FALSE
20 A12    5   95872    FALSE
21 A13    1   78513     TRUE
22 A13    2   90502     TRUE
23 A13    3  108683     TRUE
24 A13    4  110711     TRUE
25 A13    5   87842     TRUE
28 A14    3   99939    FALSE
30 A14    5  111289    FALSE
31 A15    1  100930    FALSE
32 A15    2   93765    FALSE
> 
Do not use HTML. Use plain text. The character string "//" is not a comment indicator in R. Do not use attach(). It does not do anything in your example, but it is poor practice. Always write out TRUE and FALSE
R. Mark Sharp, Ph.D.
msharp at TxBiomed.org





> On Jun 24, 2015, at 1:26 PM, giacomo begnis <gmbegnis at yahoo.it> wrote:
> 
> Hi, I have a dataset  (728 obs) containing three variables code of a company, year and revenue. Some companies have a complete history of 5 years, others have not a complete history (for instance observations for three or four years).I would like to determine the companies with a complete history using a dummy variables.I have written the following program but there is somehting wrong because the dummy variable that I have create is always equal to zero.Can somebody help me?Thanks, gm
> 
> z<-read.table(file="c:/Rp/cddat.txt", sep="", header=T)
> attach(z)
> n<-length(z$cod)  // number of obs dataset
> 
> d1<-numeric(n)   // dummy variable
> 
> for (i in 5:n)  {
>    if (z$cod[i]==z$cod[i-4])             // cod is the code of a company             { d1[i]<=1} else { d1[i]<=0}          // d1=1 for a company with complete history, d1=0 if the history is not complete  }d1
> When I run the program d1 is always equal to zero. Why?
> Once I have create the dummy variable with subset I obtains the code of the companies with a complete history and finally with a merge  I determine a panel of companies with a complete history.But how to determine correctly d1?My best regards, gm
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list