[R] Randomly select elements based on criteria

Petr Savicky savicky at cs.cas.cz
Fri Mar 23 10:56:11 CET 2012


On Thu, Mar 22, 2012 at 11:42:53AM -0700, aly wrote:
> Hi,
> 
> I want to randomly pick 2 fish born the same day but I need those
> individuals to be from different families. My table includes 1787 fish
> distributed in 948 families. An example of a subset of fish born in one
> specific day would look like:
> 
> >fish
> 
> fam   born  spawn
> 25	46	43
> 25	46	56
> 26	46	50
> 43	46	43
> 131	46	43
> 133	46	64
> 136	46	43
> 136	46	42
> 136	46	50
> 136	46	85
> 137	46	64
> 142	46	85
> 144	46	56
> 144	46	64
> 144	46	78
> 144	46	85
> 145	46	64
> 146	46	64
> 147	46	64
> 148	46	78
> 149	46	43
> 149	46	98
> 149	46	85
> 150	46	64
> 150	46	78
> 150	46	85
> 151	46	43
> 152	46	78
> 153	46	43
> 156	46	43
> 157	46	91
> 158	46	42
> 
> Where "fam" is the family that fish belongs to, "born" is the day it was
> born (in this case day 46), and "spawn" is the day it was spawned. I want to
> know if there is a correlation in the day of spawn between fish born the
> same day but that are unrelated (not from the same family). 
> I want to randomly select two rows but they have to be from different fam.
> The fist part (random selection), I got it by doing:
> 
> > ran <- sample(nrow (fish), size=2); ran
> 
> [1]  9 12
> 
> > newfish <- fish [ran,];  newfish
> 
>     fam born spawn
> 103 136   46    50 
> 106 142   46    85 
> 
> In this example I got two individuals from different families (good) but I
> will repeat the process many times and there's a chance that I get two fish
> from the same family (bad):
> 
> > ran<-sample (nrow(fish), size=2);ran
> 
> [1] 26 25
> 
> > newfish <-fish [ran,]; newfish
> 
>     fam born spawn
> 127 150   46    85
> 126 150   46    78
> 
> I need a conditional but I have no clue on how to include it in the code.

Hi.

Try the following.

  ran1 <- sample(nrow(fish), 1)
  ind <- which(fish$fam !=  fish$fam[ran1])
  ran2 <- ind[sample(length(ind), 1)]
  fish[c(ran1, ran2), ]

This generates the pairs from exactly the same distribution as
the rejection method suggested earlier, however, it does not
contain a loop.

Hope this helps.

Petr Savicky.



More information about the R-help mailing list