# [R] How do I generate one vector for every row of a data frame?

Simon Knapp sleepingwell at gmail.com
Fri Dec 19 07:20:26 CET 2008

```... actually, the scaling of the weights was not required as it is
done by sample anyway.

On Fri, Dec 19, 2008 at 5:16 PM, Simon Knapp <sleepingwell at gmail.com> wrote:
> Your code will always generate the same number of samples from each of
> the normals specified on every call, where the number of samples from
> each is (roughly) proportional to the weights column. If the weights
> column in your data frame represents probabilities of draws coming
> from each distribution, then this behaviour is not correct. Further,
> it does not guarantee that the sample size is actually n.
>
> This definition will work with arbitrary numbers of rows:
>
>  gmm_data <- function(n, data){
>    rows <- sample(1:nrow(data), n, T, dat\$weight)
>    rnorm(n, data\$mean[rows], data\$sd[rows])
> }
>
> and this one enforces a bit more sanity :-)
>
> gmm_data <- function(n, data, tol=1e-8){
>    if(any(data\$sd < 0)) stop("all of data\$sd must be > 0")
>    if(any(data\$weight < 0)) stop("all of data\$weight must be > 0")
>    wgts <- if(abs(sum(data\$weight) - 1) > tol) {
>        warning("data\$weight does not sum to 1 - rescaling")
>        data\$weight/sum(data\$weight)
>    } else data\$weight
>    rows <- sample(1:nrow(data), n, T, wgts)
>    rnorm(n, data\$mean[rows], data\$sd[rows])
> }
>
> Regards,
> Simon Knapp.
>
> On Fri, Dec 19, 2008 at 4:14 PM, Bill McNeill (UW)
> <billmcn at u.washington.edu> wrote:
>> I am trying to generate a set of data points from a Gaussian mixture
>> model.  My mixture model is represented by a data frame that looks
>> like this:
>>
>>> gmm
>>  weight mean  sd
>> 1    0.3    0 1.0
>> 2    0.2   -2 0.5
>> 3    0.4    4 0.7
>> 4    0.1    5 0.3
>>
>> I have written the following function that generates the appropriate data:
>>
>> gmm_data <- function(n, gmm) {
>>        c(rnorm(n*gmm[1,]\$weight, gmm[1,]\$mean, gmm[1,]\$sd),
>>                rnorm(n*gmm[2,]\$weight, gmm[2,]\$mean, gmm[2,]\$sd),
>>                rnorm(n*gmm[3,]\$weight, gmm[3,]\$mean, gmm[3,]\$sd),
>>                rnorm(n*gmm[4,]\$weight, gmm[4,]\$mean, gmm[4,]\$sd))
>> }
>>
>> However, the fact that my mixture has four components is hard-coded
>> into this function.  A better implementation of gmm_data() would
>> generate data points for an arbitrary number of mixture components
>> (i.e. an arbitrary number of rows in the data frame).
>>
>> How do I do this?  I'm sure it's simple, but I can't figure it out.
>>
>> Thanks.
>> --
>> Bill McNeill
>> http://staff.washington.edu/billmcn/index.shtml
>

```