[R] looping using 'diverse' package measures

David L Carlson dcarlson at tamu.edu
Thu Oct 19 15:35:26 CEST 2017


You really need to spend some time learning the basics of R. There are thousands of R packages, so you also need to spend time reading the documentation for the package so that you can show us what the data format should be like. Here are some simple ways to transform the data. You should also use dput() to include your data in your email, not just a listing which can remove important information about the structure of the original data:

> Example <- structure(list(companyid = c(85390L, 85390L, 85390L, 85390L, 
85390L, 85390L, 85390L, 85390L, 85390L, 85390L, 85390L, 85390L, 
85390L, 85390L, 85390L, 4391076L, 4391076L, 4391076L, 4391076L, 
4391076L, 4391076L, 4391076L, 4391076L, 4391076L, 4391076L), 
    year = c(1999L, 1999L, 1999L, 1999L, 1999L, 2000L, 2000L, 
    2000L, 2000L, 2000L, 2001L, 2001L, 2001L, 2001L, 2001L, 2005L, 
    2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L
    ), workerid = c(46446384, 126800000, 163300000, 60225451, 
    60195422, 60225451, 3.571e+09, 163300000, 163300000, 126800000, 
    60195422, 60225451, 46446384, 60195422, 60225451, 13753759, 
    49988911, 112400000, 185500000, 35649643, 65809705, 114200000, 
    192100000, 64258701, 1.212e+09), gender = c(0L, 1L, 0L, 0L, 
    0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 1L)), .Names = c("companyid", "year", 
"workerid", "gender"), class = "data.frame", row.names = c(NA, 
-25L))
> aggregate(gender~companyid+year, Example, mean)
  companyid year gender
1     85390 1999    0.2
2     85390 2000    0.2
3     85390 2001    0.2
4   4391076 2005    0.1

> aggregate(gender~companyid+year, Example, table)
  companyid year gender.0 gender.1
1     85390 1999        4        1
2     85390 2000        4        1
3     85390 2001        4        1
4   4391076 2005        9        1

> x <- xtabs(~gender+companyid+year, Example)
> ftable(x, row.vars=2:3, col.vars=1)
               gender 0 1
companyid year           
85390     1999        4 1
          2000        4 1
          2001        4 1
          2005        0 0
4391076   1999        0 0
          2000        0 0
          2001        0 0
          2005        9 1

You should read these manual pages:
?dput
?aggregate
?xtabs
?ftable

----------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77843-4352




-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Li Jiang
Sent: Thursday, October 19, 2017 4:08 AM
To: r-help at r-project.org
Subject: [R] looping using 'diverse' package measures

Hi everyone,

I'm new at R (although I'm a Stata user for some time and somehow proficient in it) and I'm trying to use the 'diverse' R package to compute a few diversity measures on a sample of firms for a period of about 10 years. I was wondering if you can give me some hints on how to best proceed on using the 'diverse' package.

My sample has the following setup. It's comprised of a annual variable number of firms which are identified by the companyid variable and the year variable (unbalanced panel). In addition I also have a variable identifying the worker, workerid. I then have a set of variables which i want to use as the basis for calculating some of the measures in the 'diverse' package. An example of the sample is as follows, using the gender variable (0 for male and 1 for female) as the variable of interest:

companyid   year    workerid    gender
85390   1999    46446384    0
85390   1999    126800000   1
85390   1999    163300000   0
85390   1999    60225451    0
85390   1999    60195422    0
85390   2000    60225451    0
85390   2000    3571000000  1
85390   2000    163300000   0
85390   2000    163300000   0
85390   2000    126800000   0
85390   2001    60195422    0
85390   2001    60225451    1
85390   2001    46446384    0
85390   2001    60195422    0
85390   2001    60225451    0
4391076 2005    13753759    0
4391076 2005    49988911    0
4391076 2005    112400000   0
4391076 2005    185500000   0
4391076 2005    35649643    0
4391076 2005    65809705    0
4391076 2005    114200000   0
4391076 2005    192100000   0
4391076 2005    64258701    0
4391076 2005    1212000000  1

Based on the 'diverse' need to calculate for each firm, for each year, for instance the diversity(gender) measure.  in Stata this would be obtained just a issuing a by firm year command, but have no idea how to tackle this is issue in R. Any ideas?

Best wishes,

Li

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list