[R] Generating input population for microsimulation

Emma Thomas thomas_ek at yahoo.com
Wed Dec 14 18:30:44 CET 2011


Actually, scratch that, sorry! 

I put the second part of your second solution code into a function and get the right data frame in the end. So:

generate_unit<- function(unit) {
pid<- 1:unit$size
senior<- rep(0, unit$size)
senior[sample(unit$size, 2)] <- 1
return(data.frame(unit_id=unit$id, pid=pid, senior=senior))
}

world<- function(n_units, unit_size){
units<- data.frame(id=1:n_units, size=unit_size)
library(plyr)
a<- ddply(units, .(id), generate_unit)
return(a)
}

and calling 

world(n_units = 2, unit_size = 5)

gives me 

   id unit_id pid senior

1   1       1   1      1
2   1       1   2      0
3   1       1   3      1
4   1       1   4      0
5   1       1   5      0
6   2       2   1      1
7   2       2   2      0
8   2       2   3      1
9   2       2   4      0
10  2       2   5      0

Which is perfect! Sorry for jumping the gun and thanks again!

-Emma


----- Original Message -----
From: Emma Thomas <thomas_ek at yahoo.com>
To: Jan van der Laan <rhelp at eoos.dds.nl>; "r-help at r-project.org" <r-help at r-project.org>
Cc: 
Sent: Wednesday, December 14, 2011 12:23 PM
Subject: Re: [R] Generating input population for microsimulation

Dear Jan,

Thanks for your reply.

The first solution works well for my needs for now, but I have a question about the second. If I run your code and then call the function:

generate_unit(10)

I get an error that

Error in unit$size : $ operator is invalid for atomic vectors


Did you experience the same thing?

In any case, I will definitely take a look at the plyr package, which I'm sure will be useful in the future.

Thanks again!

Emma



----- Original Message -----
From: Jan van der Laan <rhelp at eoos.dds.nl>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc: Emma Thomas <thomas_ek at yahoo.com>
Sent: Wednesday, December 14, 2011 6:18 AM
Subject: Re: [R] Generating input population for microsimulation

Emma,

If, as you say, each unit is the same you can just repeat the units to obtain the required number of units. For example,


  unit_size <- 10
  n_units <- 10

  unit_id <- rep(1:n_units, each=unit_size)
  pid     <- rep(1:unit_size, n_units)
  senior  <- ifelse(pid <= 2, 1, 0)

  pop <- data.frame(unit_id, pid, senior)


If you want more flexibility in generating the units, I would first generate the units (without the persons) and then generate the persons for each unit. In the example below I use the plyr package; you could probably also use lapply/sapply, or simply a loop over the units.

  library(plyr)

  generate_unit <- function(unit) {
      pid <- 1:unit$size
      senior <- rep(0, unit$size)
      senior[sample(unit$size, 2)] <- 1
      return(data.frame(unit_id=unit$id, pid=pid, senior=senior))
  }

  units <- data.frame(id=1:n_units, size=unit_size)

  library(plyr)
  ddply(units, .(id), generate_unit)


HTH,

Jan




Emma Thomas <thomas_ek at yahoo.com> schreef:

> Hi all,
> 
> I've been struggling with some code and was wondering if you all could help.
> 
> I am trying to generate a theoretical population of P people who are housed within X different units. Each unit follows the same structure- 10 people per unit, 8 of whom are junior and two of whom are senior. I'd like to create a unit ID and a unique identifier for each person (person ID, PID) in the population so that I have a matrix that looks like:
> 
>      unit_id pid senior
>   [1,]      1   1      0
>   [2,]      1   2      0
>   [3,]      1   3      0
>   [4,]      1   4      0
>   [5,]      1   5      0
>   [6,]      1   6      0
>   [7,]      1   7      0
>   [8,]      1   8      0
>   [9,]      1   9      1
>   [10,]    1   10   1
> ...
> 
> I came up with the following code, but am having some trouble getting it to populate my matrix the way I'd like.
> 
> world <- function(units, pop_size, unit_size){
>     pid <- rep(0,pop_size) #person ID
>     senior <- rep(0,pop_size) #senior in charge
>     unit_id <- rep(0,pop_size) #unit ID
>    
>         for (i in 1:pop_size){
>         for (f in 1:units)    { 
>         senior[i] = sample(c(1,1,0,0,0,0,0,0,0,0), 1, replace = FALSE)
>         pid[i] = sample(c(1:10), 1, replace = FALSE)
>         unit_id[i] <- f
>                 }}   
>     data <- cbind(unit_id, pid, senior)
>    
>     return(data)
>     }
> 
>     world(units = 10,pop_size = 100, unit_size = 10) #call the function
> 
> 
> 
> The output looks like:
>      unit_id pid senior
>   [1,]      10   7      0
>   [2,]      10   4      0
>   [3,]      10  10      0
>   [4,]      10   9      1
>   [5,]      10  10      0
>   [6,]      10   1      1
> ...
> 
> but what I really want is to generate is 10 different units with two seniors per unit, and with each person in the population having a unique identifier.
> 
> I thought a nested for loop was one way to go about creating my data set of people and families, but obviously I'm doing something (or many things) wrong. Any suggestions on how to fix this? I had been focusing on creating a person and assigning them to a unit, but perhaps I should create the units and then populate the units with people?
> 
> Thanks so much in advance.
> 
> Emma
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list