[R] RNORM matrix based on CSV file values for MEAN and SD

R. Michael Weylandt michael.weylandt at gmail.com
Tue May 22 17:41:45 CEST 2012


No CSV came through so I'll just assume you get in a data.frame from
read.csv() that looks something like this

params <- data.frame(mean = c(1,4,7), sd = c(2,2,5))

and you want 10 samples from each. If you're on memory constraints,
you can simply loop over rows and append to a growing CSV.

for(i in NROW(params)){
    write.table(c(i, 0, 0, rnorm(10, params$mean[i], params$sd[i])),
"out.csv", append = TRUE, sep =",", row.names = FALSE, col.names =
FALSE)
}

Note that we have to set the names to false or the appending gets messy.

It's probably faster (though more work) to do a few rows at a time and
to use textConnections so you aren't constantly opening and closing
the file, but this should get you started.

See the examples of ?textConnection for how to do that bit properly.

Best,

Michael


On Tue, May 22, 2012 at 10:43 AM, dcoakley <danielcoakley1 at gmail.com> wrote:
> This should (hopefully) be a pretty simple task. What I'd like to do is read
> in a csv file containing means and standard deviations for a large number of
> 'n' parameters (up to 2000). The list would be in the following format (see
> attached read.csv):
>
> Paramter(1), mean, standard dev.,
> Paramter(2), mean, standard dev.,
> Paramter(3), mean, standard dev.,
> ...
> Paramter(n), mean, standard dev.,
>
>
> Based on the above csv file, I would then like to generate a large sample
> matrix for 's' samples, using the rnorm function. The matrix will be in the
> following format:
>
> 1,0,0, P1(1), P2(1), P3(1), ... Pn(1)
> 2,0,0, P1(2), P2(2), P3(2), ... Pn(2)
> ....
> s,0,0, P1(s), P2(s), P3(s), ... Pn(s)
>
> The first column contains the Row number. Taking s=30000, we would have rows
> numbered 1 to 30,000.
>
> The second and third column are fixed values - 0
>
> The forth and subsequent columns contain values from the rnorm distribution
> for each parameter. P1(1) is the first value generated for the first
> parameter, P1(2) is the second value generated and so forth. P2(1) is the
> first value generated for the second parameter, P2(2) is the second value
> generated and so forth.  Pn(1) is the first value generated for the n'th
> parameter, Pn(2) is the second value generated and so forth.
>
> Again the number of rows depends on 's', the number of samples.
>
> Therefore, I will be generating a fairly large matrix. This could be a
> 1,000,000 x 2,000 matrix. However, due to memory constraints, it may be
> necessary to break this down into smaller sub-matrices where I limit the
> number of rows. Firstly, is this possible in r, and secondly, can anyone
> help suggest a method for creating such a matrix.
>
> I'd really appreciate any help on this. Thank you.
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/RNORM-matrix-based-on-CSV-file-values-for-MEAN-and-SD-tp4630901.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list