[R] Combining multiple probability weights for the sample() function.

Tue Jun 2 15:57:05 CEST 2015

Ben,

Perhaps I am missing something, but couldn't you simply reduce your
possibilities to:

possibilities[c(1, 5, 2, 4), ]
     Var1 Var2 Var3
[1,] "A"  "A"  "C"
[2,] "A"  "A"  "T"
[3,] "C"  "A"  "C"
[4,] "C"  "G"  "C"

If you sample from these four rows you will have a 50% chance that Var1 and
Var2 are equal and a 50% chance that Var1 and Var3 are equal.

Jean

On Tue, Jun 2, 2015 at 7:26 AM, Benjamin Ward (ENV) <B.Ward at uea.ac.uk>
wrote:

> Dear R-List,
>
> I have a set of possibilities I want to sample from:
>
> bases <- list(c('A', 'C'), c('A', 'G'), c('C', 'T'))
> possibilities <- as.matrix(expand.grid(bases))
>
> >possibilities
> Var1 Var2 Var3
> [1,] "A"  "A"  "C"
> [2,] "C"  "A"  "C"
> [3,] "A"  "G"  "C"
> [4,] "C"  "G"  "C"
> [5,] "A"  "A"  "T"
> [6,] "C"  "A"  "T"
> [7,] "A"  "G"  "T"
> [8,] "C"  "G"  "T"
>
> If I want to randomly sample one of these rows. If I do this, I find that
> it is 25% likely that my choice will have an identical first and last
> letter (e.g. [1,] "A"  "A"  "C"). It is also 25% likely that my choice will
> have an identical first and third letter (e.g. [4,] "C"  "G"  "C"). It is
> not likely at all that the second and third letter of my choice could be
> identical.
>
> What I would like to do, is sample one of the rows, but given the
> constraint that the probability of drawing identical letters 1 and 2 should
> be 50% or 0.5, and at the same time the probability of drawing identical
> letters 1 and 3 should be 50%. I am unsure on how to do this, but I know it
> involves coming up with a modified set of weights for the sample()
> function. My progress is below, any advice is much appreciated.
>
> Best Wishes,
>
> Ben Ward, UEA.
>
>
> So I have used the following code to come up with a matrix, which contains
> weighting according to each criteria:
>
> possibilities <- as.matrix(expand.grid(bases))
>   identities <- apply(possibilities, 1, function(x) c(x[1] == x[2], x[1]
> == x[3], x[2] == x[3]))
>   prob <- matrix(rep(0, length(identities)), ncol = ncol(identities))
>   consProb <- apply(identities, 1, function(x){0.5 / length(which(x))})
>   polProb <- apply(identities, 1, function(x){0.5 / length(which(!x))})
>   for(i in 1:nrow(identities)){
>     prob[i, which(identities[i,])] <- consProb[i]
>     prob[i, which(!identities[i,])] <- polProb[i]
>   }
>   rownames(prob) <- c("1==2", "1==3", "2==3")
>   colnames(prob) <- apply(possibilities, 1, function(x)paste(x, collapse =
> ", "))
>
> This code gives the following matrix:
>
>                 A, A, C    C, A, C          A, G, C        C, G, C
>  A, A, T         C, A, T       A, G, T       C, G, T
> 1==2 0.25000000 0.08333333 0.08333333 0.08333333 0.25000000 0.08333333
> 0.08333333 0.08333333
> 1==3 0.08333333 0.25000000 0.08333333 0.25000000 0.08333333 0.08333333
> 0.08333333 0.08333333
> 2==3 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000
> 0.06250000 0.06250000
>
> Each column is one of the choices from 'possibilities', and each row gives
> a series of weights based on three different criteria:
>
> Row 1, that if it possible from the choices for letter 1 == letter 2, that
> combined chance be 50%.
> Row 2, that if it possible from the choices for letter 1 == letter 3, that
> combined chance be 50%.
> Row 3, that if it possible from the choices for letter 2 == letter 3, that
> combined chance be 50%.
>
> So:
>
>  If I used sample(x = 1:now(possibilities), size = 1, prob = prob[1,])
> repeatedly, I expect about half the choices to contain identical letters 1
> and 2.
>
>  If I used sample(x = 1:now(possibilities), size = 1, prob = prob[2,])
> repeatedly, I expect about half the choices to contain identical letters 1
> and 3.
>
> If I used sample(x = 1:now(possibilities), size = 1, prob = prob[3,])
> repeatedly, I expect about half the choices to contain identical letters 2
> and 3. Except that in this case, since it is not possible.
>
> Note each row sums to 1.
>
> What I would like to do - if it is possible - is combine these three sets
> of weights into one set, that when used with
> sample(x = 1:nrow(possibilities, size = 1, prob = MAGICPROB) will give me
> a list of choices, where ~50% of them contain identical letters 1 and 2,
> AND ~50% of them contain identical letters 1 and 3, AND ~50% again contain
> identical letters 2 and 3 (except in this example as it is not possible
> from the choices).
>
> Can multiple probability weightings be combined in such a manner?
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]