[R] Help with simulation of unbalanced clustered data

Abby Spurdle @purd|e@@ @end|ng |rom gm@||@com
Thu Dec 17 05:32:20 CET 2020


Hi Chao Liu,

I'm having difficulty following your question, and examples.
And also, I don't see the motivation for increasing, then decreasing
the sample sizes.
Intuitively, one would compute the correct sample sizes, first time round...

But I thought I'd add some comments, just in case they're useful.

If the problem relates to memberships (in clusters), then the problem
can be simplified.
All one needs is an integer vector, where each value is the index of
the cluster.

To compute random memberships of 600 observations in 20 clusters, one could run:

    m <- sample (1:20, 600, TRUE)

To compute the number of observations per cluster, one could then run:

    table (m)

In the above code, the probability of an observation being assigned to
each cluster, is uniform.
Non-uniform sampling can be achieved by supplying a 4th argument to
the sample function, which is a numeric vector of weights.


On Wed, Dec 16, 2020 at 10:08 PM Chao Liu <psychaoliu using gmail.com> wrote:
>
> Dear R experts,
>
> I want to simulate some unbalanced clustered data. The number of clusters
> is 20 and the average number of observations is 30. However, I would like
> to create an unbalanced clustered data per cluster where there are 10% more
> observations than specified (i.e., 33 rather than 30). I then want to
> randomly exclude an appropriate number of observations (i.e., 60) to arrive
> at the specified average number of observations per cluster (i.e., 30). The
> probability of excluding an observation within each cluster was not uniform
> (i.e., some clusters had no cases removed and others had more excluded).
> Therefore in the end I still have 600 observations in total. How to realize
> that in R? Thank you for your help!
>
> Best,
>
> Liu
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list