[R] Odp: how to generate data set with different length and calculate the mean?

Petr PIKAL petr.pikal at precheza.cz
Mon Feb 1 13:44:17 CET 2010


Hi

I have no idea how you could do what you want. I only recommend you to use 
list instead of matrix as list can incorporate objects with various size

I am not sure if this is the most elegant way but you can make your matrix 
a data frame

ddd<- as.data.frame(data)
and than use thist

lapply(ddd, function(x) unlist(list(x)))

To get list of vectors

Regards
Petr

r-help-bounces at r-project.org napsal dne 01.02.2010 03:46:34:

> 
> Hello,
> 
> This may be a rare question. I am struggling to solve it. I really
> appreciate any help or suggestions. Thanks a lot in advance!
> 
> 
> I put my questions between the code to make it clear. The problem I have 
is:
> I generated 10 data sets with 8 data for each set. Now I want to change 
the
> number of data in each dataset according to a vector 'size' (as 
follows),
> that is, each new dataset contains different number of data. How can I 
do
> it? After generating the new datasets, how can I seperate the data from 
two
> distributions and calculate the sample mean? Thanks a lot. 
> 
> 
> 
> # generate 10 data sets, each data sets include 8 sample. 4 from N(0, 1) 
and
> 4 from N(5, 1)
> data<- matrix(0,10,8)
>  th    <- c(0, 5, 1)
> for(i in 1:10){
>  data[i,] <- rnorm(8,mean= rep(th[1:2],8/2),sd=th[3])
> }
> 
> # change the number of samples for each data set.  e.g. the first 
dataset
> needs to increase to 20, the #first 8 keep the same, add another 12 
sample
> (6 from N(0,1) and the other 6 from N(5, 1) ), the second #dataset needs 
to
> increase to 10, keep the first 8 the same, generate another 2 (one from
> N(0,1) and the #other one from N(5,1)),  the third data set does not 
need to
> change. etc. 
> 
> size=c(20, 10, 8, 14, 16, 12, 8, 80)
> 
> 
> # Since each data set changes to different size, and add different 
number of
> data,  for each dataset how #can I calculate the difference of the 
sample
> mean from N(0,1) and the sample mean from 
> #N(5,1) and the pooled standard deviation of two samples. Two 
difficulties:
> each new dataset includes #different number of data; another difficulty,
> when I generated data, the two successive data are 
> #from different normal distribution, how can I seperate them and 
calculate
> the average for each sample #and pooled standard deviation?
> 
> 
> 
> -- 
> View this message in context: 
http://n4.nabble.com/how-to-generate-data-set-
> with-different-length-and-calculate-the-mean-tp1458420p1458420.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list