[R] Select Random Rows from a dataframe

Joshua Wiley jwiley.psych at gmail.com
Thu Jul 21 22:29:42 CEST 2011


Hi Sean,

Here is one option that I believe does what you want.  The logic is simple,

1) instatiate a test value that does not meet your criteria
2) use a while loop that will keep looping until your criteria is met
(be sure you set a meetable criteria or this will continue ad
nauseum!!!),
3) randomly select a starting point from all rows except the last 40
(because if it started there, you could not have a consecutive block
of 40, unless you allow wrapping to the beginning in which case you
need to say that).


n <- nrow(yourdata)
test <- 0

while (test < 5) {
  i <- sample(1:(n - 40), 1)
  x <- yourdata[seq(from = i, to = i + 40), ]
  test <- sum(x[, "dp2"])
}

Hope this helps,

Josh

On Thu, Jul 21, 2011 at 1:16 PM, Sean Bignami <bignami83 at gmail.com> wrote:
> Hi all,
> I have a dataframe of behavioral observations from 360 fish, each with 241 observation points(rows), which looks like this:
>
>> head(d)
>        fish    treatment tank trial video tid pid   ang.chg    abs.ac           t       len        vel     d2p           x         y
> 1         1              3     1      1       1      1   1        NA        NA            0.0   0.000    NA    NA       5.169   9.617
> 2         1              3     1      1       1      1   2        NA        NA            0.5   0.203  0.405   0.203    5.254   9.433
> 3         1              3     1      1       1      1   3  -78.69660  78.69660    1.0   0.321  0.238   0.119    5.184    9.337
> 4         1              3     1      1       1      1   4  -29.58347  29.58347    1.5   0.648  0.653   0.327    5.147    9.013
> 5         1              3     1      1       1      1   5  140.96235 140.96235  2.0   0.988  0.680   0.340    5.434    8.830
> 6         1              3     1      1       1      1   6  11.52867  11.52867     2.5   1.463  0.949   0.474    5.877    8.660
>
> I have divided it into subsets of "treatment" and "video" types (i.e. dsub.r1 has all treatment 1 and video 1, dsub.s2 all treatment 2, video 2, etc for treatments 1:3 and video1:2)
>
> First: I want to randomly sample a block of 40 consecutive rows from each fish (1-360),
> BUT: I can only use that sample if the sum of the "d2p" column for that set of 40 rows is at least 5, otherwise I need to re-sample for another 40-row block, until I get one with a d2p sum of at least 5, if available.
>
> I know how to sample one row, or multiple random rows, but can't figure out how to sample a block of consecutive rows...let alone how to check if the sum of the d2p column is at least 5 and then re-sample if not.
>
>  Sadly this is beyond my current knowledge level in R. Please help!!
>
> Thanks in advance!
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
https://joshuawiley.com/



More information about the R-help mailing list