[R] Randomly select elements based on criteria

Peter Ehlers ehlers at ucalgary.ca
Fri Mar 23 00:58:08 CET 2012


Here's another way:
With d1 as your data frame,

  library(plyr)
  d2 <- ddply(d1, .(fam), function(x) x[sample(nrow(x), 1), ])
  d2[sample(nrow(d2), 2), ]

If you have to take account of the 'only one family' case, you
can wrap this in a function with an appropriate check:

   fish <- function(d){
     if(length(unique(d[,'fam'])) < 2) stop('only one family')
     d2 <- ddply(d,.(fam),function(x)x[sample(nrow(x), 1), ])
     d2[sample(nrow(d2), 2), ]
   }

Peter Ehlers

On 2012-03-22 16:03, Jorge I Velez wrote:
> You could avoid the loop to run for ever by introducing a stop() check.
> Here is an example using Dr. Savicky's code:
>
> # function to sample B pairs of
> # fishes from different families
> # -- d has columns fam, born, spawn
> foo<- function(d, B){
>
>      # internal function
>      foo<- function(d){
>          if(length(unique(d[, 'fam']))<  2) stop('only one family!')
>          while (1) {
>              ran<- sample(NROW(d), size = 2)
>              if (d[ran[1], 1] != d[ran[2], 1]) break
>            }
>        d[ran, ]
>        }
>
>        # sampling B pairs of fishes
>        lapply(1:B, function(i) foo(d))
>        }
>
> # example:  2 pairs of fishes from different families
> foo(fish, 2)
>
> #  data with only one family
> ff<- fish[1,]
> foo(ff, 2)  # Error in foo(d) : only one family!
>
> HTH,
> Jorge.-
>
>
> On Thu, Mar 22, 2012 at 5:27 PM, Petr Savicky<>  wrote:
>
>> On Thu, Mar 22, 2012 at 11:42:53AM -0700, aly wrote:
>>> Hi,
>>>
>>> I want to randomly pick 2 fish born the same day but I need those
>>> individuals to be from different families. My table includes 1787 fish
>>> distributed in 948 families. An example of a subset of fish born in one
>>> specific day would look like:
>>>
>>>> fish
>>>
>>> fam   born  spawn
>>> 25    46      43
>>> 25    46      56
>>> 26    46      50
>>> 43    46      43
>>> 131   46      43
>>> 133   46      64
>>> 136   46      43
>>> 136   46      42
>>> 136   46      50
>>> 136   46      85
>>> 137   46      64
>>> 142   46      85
>>> 144   46      56
>>> 144   46      64
>>> 144   46      78
>>> 144   46      85
>>> 145   46      64
>>> 146   46      64
>>> 147   46      64
>>> 148   46      78
>>> 149   46      43
>>> 149   46      98
>>> 149   46      85
>>> 150   46      64
>>> 150   46      78
>>> 150   46      85
>>> 151   46      43
>>> 152   46      78
>>> 153   46      43
>>> 156   46      43
>>> 157   46      91
>>> 158   46      42
>>>
>>> Where "fam" is the family that fish belongs to, "born" is the day it was
>>> born (in this case day 46), and "spawn" is the day it was spawned. I
>> want to
>>> know if there is a correlation in the day of spawn between fish born the
>>> same day but that are unrelated (not from the same family).
>>> I want to randomly select two rows but they have to be from different
>> fam.
>>> The fist part (random selection), I got it by doing:
>>>
>>>> ran<- sample(nrow (fish), size=2); ran
>>>
>>> [1]  9 12
>>>
>>>> newfish<- fish [ran,];  newfish
>>>
>>>      fam born spawn
>>> 103 136   46    50
>>> 106 142   46    85
>>>
>>> In this example I got two individuals from different families (good) but
>> I
>>> will repeat the process many times and there's a chance that I get two
>> fish
>>> from the same family (bad):
>>>
>>>> ran<-sample (nrow(fish), size=2);ran
>>>
>>> [1] 26 25
>>>
>>>> newfish<-fish [ran,]; newfish
>>>
>>>      fam born spawn
>>> 127 150   46    85
>>> 126 150   46    78
>>>
>>> I need a conditional but I have no clue on how to include it in the code.
>>
>> Hi.
>>
>> Try the following.
>>
>>   while (1) {
>>     ran<- sample(nrow(fish), size=2)
>>     if (fish[ran[1], 1] != fish[ran[2], 1]) break
>>   }
>>   fish[ran, ]
>>
>> This will generate only pairs from different families. However,
>> note that the loop will run forever, if the data contain only
>> fish from one family.
>>
>> Hope this helps.
>>
>> Petr Savicky.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list