[R] Question about sampling

R. Michael Weylandt michael.weylandt at gmail.com
Thu Jun 14 17:08:02 CEST 2012


I think you're right -- prob probably isn't quite what you need (at
least, directly): constrained sampling like this is a little trickier
-- I'll leave this to someone who knows more than me.

Michael

On Thu, Jun 14, 2012 at 9:07 AM, Guido Leoni <guido.leoni at gmail.com> wrote:
> Sorry I'm not sure that prob is suitable for my purposes(but i'm quite
> newbie with R).
> If I correctly understand prob allows to set a weight for each row in the
> original dataset in order to include the rows on the basis of their
> weights). ... I'm not sure to correctly understanding ;-)
> In my case all the rows are equally important. I  need  "simply " that my
> subset has in each column the same frequency of  1 that in the original
> dataset
> Thank you again
> Guido
>
> 2012/6/14 R. Michael Weylandt <michael.weylandt at gmail.com>
>>
>> sample() takes a prob = argument which lets you supply weights, which
>> need not sum to one so, if I understand you, you could just pass TRUEs
>> and FALSEs for those rows you want. If I'm wrong about that last bit,
>> I'm still pretty confident sample(prob = ) is the way to go.
>>
>> Best,
>> Michael
>>
>> On Thu, Jun 14, 2012 at 6:02 AM, Guido Leoni <guido.leoni at gmail.com>
>> wrote:
>> > Dear list I wish to extract from a population genotypized for 10 SNP a
>> > subsample of the same population of size n with similar allele
>> > frequencies.
>> > Essentially i have a matrix of 200 rows (df) like this
>> > Name,Condition,rs1385699_X,rs6625163_X,rs962458_X,Rs4658627_1,
>> > sample01,Case,1,1,1,-1
>> > sample02,Control,1,1,1,1
>> > sample06,Control,1,-1,1,0
>> > sample10,Case,1,1,1,0
>> > sample11,Control,1,1,1,1
>> > sample24,Control,-1,-1,1,0
>> > sample29,Control,1,-1,1,0
>> > sample42,Case,-1,-1,1,0
>> > sample64,Case,-1,1,1,0
>> > ....
>> > I'm interested to mantain in my subsample the same frequencies of those
>> > observed for the 1 value in each column
>> > I approached the problem with sample() function
>> >
>> > mysample<-df[sample(1:nrow(df),100,replace=F),]
>> > Then I tested that  the frequencies of each allele in mysample are not
>> > statistically different respect to the initial dataset by mean of
>> > prop.test
>> > This seems to work but do you know if there is a package that can do the
>> > same thing  allowing for example a more strict control?
>> > Thank you very much
>> > Guido
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
> Guido Leoni
> National Research Institute on Food and Nutrition
> (I.N.R.A.N.)
> via Ardeatina 546
> 00178 Rome
> Italy
>
> tel     + 39 06 51 49 41 (operator)
>         + 39 06 51 49 4498 (direct)



More information about the R-help mailing list