[R] Randomly select elements based on criteria
Petr Savicky
savicky at cs.cas.cz
Fri Mar 23 11:47:18 CET 2012
On Fri, Mar 23, 2012 at 10:56:11AM +0100, Petr Savicky wrote:
> On Thu, Mar 22, 2012 at 11:42:53AM -0700, aly wrote:
[...]
> > I want to randomly select two rows but they have to be from different fam.
> > The fist part (random selection), I got it by doing:
> >
> > > ran <- sample(nrow (fish), size=2); ran
> >
> > [1] 9 12
> >
> > > newfish <- fish [ran,]; newfish
> >
> > fam born spawn
> > 103 136 46 50
> > 106 142 46 85
> >
> > In this example I got two individuals from different families (good) but I
> > will repeat the process many times and there's a chance that I get two fish
> > from the same family (bad):
> >
> > > ran<-sample (nrow(fish), size=2);ran
> >
> > [1] 26 25
> >
> > > newfish <-fish [ran,]; newfish
> >
> > fam born spawn
> > 127 150 46 85
> > 126 150 46 78
> >
> > I need a conditional but I have no clue on how to include it in the code.
>
> Hi.
>
> Try the following.
>
> ran1 <- sample(nrow(fish), 1)
> ind <- which(fish$fam != fish$fam[ran1])
> ran2 <- ind[sample(length(ind), 1)]
> fish[c(ran1, ran2), ]
>
> This generates the pairs from exactly the same distribution as
> the rejection method suggested earlier, however, it does not
> contain a loop.
Hi.
I am sorry for a wrong statement. If there are more than two
families, then the distributions from the two methods are only
approximately equal, not exactly.
If the sizes of families are, say
n <- c(20, 3, 3)
p <- n/sum(n)
then the probability to a get a pair from families (i, j)
using the rejection method is p1[i, j], where p1 is
p1 <- p %o% p
diag(p1) <- 0
p1 <- p1/sum(p1)
p1 <- p1 + t(p1)
p1[row(p1) >= col(p1)] <- 0
p1
[,1] [,2] [,3]
[1,] 0 0.4651163 0.46511628
[2,] 0 0.0000000 0.06976744
[3,] 0 0.0000000 0.00000000
The above produces a pair from families (i, j) with probability
p2[i, j], where p2 is
p2 <- p %o% p
diag(p2) <- 0
p2 <- p2/rep(rowSums(p2), times=nrow(p2))*p
p2 <- p2 + t(p2)
p2[row(p2) >= col(p2)] <- 0
p2
[,1] [,2] [,3]
[1,] 0 0.4849498 0.48494983
[2,] 0 0.0000000 0.03010033
[3,] 0 0.0000000 0.00000000
Petr Savicky.
More information about the R-help
mailing list