[R] How do I generate one vector for every row of a data frame?
Simon Knapp
sleepingwell at gmail.com
Fri Dec 19 07:20:26 CET 2008
... actually, the scaling of the weights was not required as it is
done by sample anyway.
On Fri, Dec 19, 2008 at 5:16 PM, Simon Knapp <sleepingwell at gmail.com> wrote:
> Your code will always generate the same number of samples from each of
> the normals specified on every call, where the number of samples from
> each is (roughly) proportional to the weights column. If the weights
> column in your data frame represents probabilities of draws coming
> from each distribution, then this behaviour is not correct. Further,
> it does not guarantee that the sample size is actually n.
>
> This definition will work with arbitrary numbers of rows:
>
> gmm_data <- function(n, data){
> rows <- sample(1:nrow(data), n, T, dat$weight)
> rnorm(n, data$mean[rows], data$sd[rows])
> }
>
> and this one enforces a bit more sanity :-)
>
> gmm_data <- function(n, data, tol=1e-8){
> if(any(data$sd < 0)) stop("all of data$sd must be > 0")
> if(any(data$weight < 0)) stop("all of data$weight must be > 0")
> wgts <- if(abs(sum(data$weight) - 1) > tol) {
> warning("data$weight does not sum to 1 - rescaling")
> data$weight/sum(data$weight)
> } else data$weight
> rows <- sample(1:nrow(data), n, T, wgts)
> rnorm(n, data$mean[rows], data$sd[rows])
> }
>
> Regards,
> Simon Knapp.
>
> On Fri, Dec 19, 2008 at 4:14 PM, Bill McNeill (UW)
> <billmcn at u.washington.edu> wrote:
>> I am trying to generate a set of data points from a Gaussian mixture
>> model. My mixture model is represented by a data frame that looks
>> like this:
>>
>>> gmm
>> weight mean sd
>> 1 0.3 0 1.0
>> 2 0.2 -2 0.5
>> 3 0.4 4 0.7
>> 4 0.1 5 0.3
>>
>> I have written the following function that generates the appropriate data:
>>
>> gmm_data <- function(n, gmm) {
>> c(rnorm(n*gmm[1,]$weight, gmm[1,]$mean, gmm[1,]$sd),
>> rnorm(n*gmm[2,]$weight, gmm[2,]$mean, gmm[2,]$sd),
>> rnorm(n*gmm[3,]$weight, gmm[3,]$mean, gmm[3,]$sd),
>> rnorm(n*gmm[4,]$weight, gmm[4,]$mean, gmm[4,]$sd))
>> }
>>
>> However, the fact that my mixture has four components is hard-coded
>> into this function. A better implementation of gmm_data() would
>> generate data points for an arbitrary number of mixture components
>> (i.e. an arbitrary number of rows in the data frame).
>>
>> How do I do this? I'm sure it's simple, but I can't figure it out.
>>
>> Thanks.
>> --
>> Bill McNeill
>> http://staff.washington.edu/billmcn/index.shtml
>
More information about the R-help
mailing list