[R] Conditional Random selection

Bert Gunter bgunter.4567 at gmail.com
Sat Nov 21 20:58:04 CET 2015


Time to do your own homework by working through an R tutorial or two.
There are many on the web -- or see the Intro to R tutorial that ships
with R.

?tapply
?unique

is one of many answers to your query.

Cheers,
Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Sat, Nov 21, 2015 at 11:52 AM, Ashta <sewashm at gmail.com> wrote:
> Hi  Bert  and all,
> I have related question.  In each  time period there were different
> locations where the samples were collected (S1).   I  want count  the
> number of unique locations (S1)  for each unique time period . So in
> time 1 the samples were collected from two locations and time 2 only
> from one location and time 3  from  three locations..
>
> tab  <- read.table(textConnection(" time   S1  rep
> 1      1       1
> 1      2       1
> 1      2       2
> 2      1       1
> 2      1       2
> 2      1       3
> 2      1       4
> 3      1       1
> 3      2       1
> 3      3       1   "),header = TRUE)
>
> what I want is
>
> time  S1
>     1    2
>     2    1
>     3    3
>
> Thank you again.
>
>
>
> On Sat, Nov 21, 2015 at 1:30 PM, Ashta <sewashm at gmail.com> wrote:
>>  Thank you Bert!
>>
>> What I want is at least 500 samples based on random  sampling of time
>> period. This allows samples  collected at the same time period are
>> included together.
>>
>> Your script is doing what I wanted to do!!
>>
>> Many thanks
>>
>>
>>
>>
>> On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>>> David's "solution" is incorrect. It can also fail to give you times
>>> with a total of 500 items to sample from in the time periods.
>>>
>>> It is not entirely clear what you want. The solution below gives you a
>>> random sample of time periods in which X1>0 and the total number of
>>> samples among them is >= 500. It does not give you the fewest number
>>> of periods that can do this. Is this what you want?
>>>
>>> tab[with(tab,{
>>>   rownums<- sample(seq_len(nrow(tab))[X1>0])
>>>   sz <- cumsum(X2[rownums])
>>>   rownums[c(TRUE,sz<500)]
>>> }),]
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>> Bert Gunter
>>>
>>> "Data is not information. Information is not knowledge. And knowledge
>>> is certainly not wisdom."
>>>    -- Clifford Stoll
>>>
>>>
>>> On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewashm at gmail.com> wrote:
>>>> Thank you  David!
>>>>
>>>> I rerun the your script and it is giving me the first three time periods
>>>> is it doing random sampling?
>>>>
>>>>       tab.fan
>>>>   time X1  X2
>>>> 2    2  5 230
>>>> 3    3  1 300
>>>> 5    5  2  10
>>>>
>>>>
>>>>
>>>> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarlson at tamu.edu> wrote:
>>>>> Use dput() to send data to the list as it is more compact:
>>>>>
>>>>>> dput(tab)
>>>>> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L,
>>>>> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names = c("time",
>>>>> "X1", "X2"), class = "data.frame", row.names = c(NA, -8L))
>>>>>
>>>>> You can just remove the lines with X1 = 0 since you don't want to use them.
>>>>>
>>>>>> tab.sub <- tab[tab$X1>0, ]
>>>>>
>>>>> Then the following gives you a sample:
>>>>>
>>>>>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ]
>>>>>
>>>>> Note, that your "solution" of times 6, 7, and 8 will never appear because the sum of the values is 586.
>>>>>
>>>>>
>>>>> David L. Carlson
>>>>> Department of Anthropology
>>>>> Texas A&M University
>>>>>
>>>>> -----Original Message-----
>>>>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ashta
>>>>> Sent: Saturday, November 21, 2015 11:53 AM
>>>>> To: R help <r-help at r-project.org>
>>>>> Subject: [R] Conditional Random selection
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a data set that contains samples collected over time.   In
>>>>> each time period the total number of samples are given (X2)   The goal
>>>>> is to  select 500  random samples.    The selection should be based on
>>>>> time  (select time periods until I reach 500 samples). Also the time
>>>>> period should have greater than 0 for  X1 variable. X1 is an indicator
>>>>> variable.
>>>>>
>>>>> Select "time" until reaching the  sum of X2  is > 500 and if   X1 is  >  0
>>>>>
>>>>> tab  <- read.table(textConnection(" time   X1 X2
>>>>> 1      0        251
>>>>> 2      5        230
>>>>> 3      1        300
>>>>> 4      0         25
>>>>> 5      2         10
>>>>> 6      3         101
>>>>> 7      1         300
>>>>>  8     4         185   "),header = TRUE)
>>>>>
>>>>> In the above example,  samples from time 1 and 4  will not be selected
>>>>> ( X1 is zero)
>>>>> So I could reach my target by selecting time 6,7, and 8 or  time 2 and
>>>>> 3 and so on.
>>>>>
>>>>> Can any one help to do that?
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list