[R] Question about sampling

R. Michael Weylandt michael.weylandt at gmail.com
Thu Jun 14 15:14:30 CEST 2012


sample() takes a prob = argument which lets you supply weights, which
need not sum to one so, if I understand you, you could just pass TRUEs
and FALSEs for those rows you want. If I'm wrong about that last bit,
I'm still pretty confident sample(prob = ) is the way to go.

Best,
Michael

On Thu, Jun 14, 2012 at 6:02 AM, Guido Leoni <guido.leoni at gmail.com> wrote:
> Dear list I wish to extract from a population genotypized for 10 SNP a
> subsample of the same population of size n with similar allele frequencies.
> Essentially i have a matrix of 200 rows (df) like this
> Name,Condition,rs1385699_X,rs6625163_X,rs962458_X,Rs4658627_1,
> sample01,Case,1,1,1,-1
> sample02,Control,1,1,1,1
> sample06,Control,1,-1,1,0
> sample10,Case,1,1,1,0
> sample11,Control,1,1,1,1
> sample24,Control,-1,-1,1,0
> sample29,Control,1,-1,1,0
> sample42,Case,-1,-1,1,0
> sample64,Case,-1,1,1,0
> ....
> I'm interested to mantain in my subsample the same frequencies of those
> observed for the 1 value in each column
> I approached the problem with sample() function
>
> mysample<-df[sample(1:nrow(df),100,replace=F),]
> Then I tested that  the frequencies of each allele in mysample are not
> statistically different respect to the initial dataset by mean of prop.test
> This seems to work but do you know if there is a package that can do the
> same thing  allowing for example a more strict control?
> Thank you very much
> Guido
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list