[R] Sampling of non-overlapping intervals of variable length

David Winsemius dwinsemius at comcast.net
Sun Jul 19 21:56:45 CEST 2009


On Jul 19, 2009, at 3:11 PM, Hadassa Brunschwig wrote:

> Hi
>
> I am not sure what you mean by sampling an index of a group of
> intervals. I will try to give an example:

If you had a dataframe of the following sort:

dfint
start stop
3	7
12 	20
40	45
60	72

And you wanted to generate a set of 100 samples with equal probability  
of occurring in any one of those intervals, you might sample first on  
the index of the intervals:

idx=sample(1:4, 100, replace=TRUE)
<some sort of appropriate iterative construct>
  ### and then sample within the intervals.
sample((dfint[idx,1]):(dfint[idx,2]), 1)
<end loop>


> Let's assume I have a vector 1:1000000. Let's say I have 10 intervals
> of different but known length, say,
> c(4,6,11,2,8,14,7,2,18,32). For simulation purposes I have to sample
> those 10 intervals 1000 times.
> The requirement is, however, that they should be of those lengths and
> should not be overlapping.
> In short, I would like to obtain a 10x1000 matrix with sampled  
> intervals.

I am trying to understand how the vector 1:1000000 relates to the  
intervals. What do you mean by "sample the 10 intervals"?  For one  
thing what you have offered are not intervals at all. Are you saying  
(in part at least) that you want equal probabilities of a sampled  
element to come from each of the intervals, however, they might be  
defined? Or do you want the sampling probabilities to vary from  
interval to interval.

I say again, ... a concrete example could do marvels in communicating  
your goals.

Do


>
> Thanks
> Hadassa
>
> On Sun, Jul 19, 2009 at 9:48 PM, David Winsemius<dwinsemius at comcast.net 
> > wrote:
>>
>> On Jul 19, 2009, at 1:05 PM, Hadassa Brunschwig wrote:
>>
>>> Hi,
>>>
>>> I hope I am not repeating a question which has been posed already.
>>> I am trying to do the following in the most efficient way:
>>> I would like to sample from a finite (large) set of integers n
>>> non-overlapping
>>> intervals, where each interval i has a different, set length L_i
>>> (which is the number
>>> of integers in the interval).
>>> I had the idea to sample recursively on a vector with the already
>>> chosen intervals
>>> discarded but that seems to be too complicated.
>>
>> It might be ridiculously easy if you sampled on an index of a group  
>> of
>> intervals.
>> Why not pose the question in the form of example data.frames or other
>> classes of R objects? Specification of the desired output would be
>> essential. I think further specification of the sampling strategy  
>> would also
>> help because I am unable to understand what sort of probability  
>> model you
>> are hoping to apply.
>>
>>> Any suggestions on that?
>>>
>>> Thanks a lot.
>>>
>>> Hadassa
>>>
>>>
>>> --
>>> Hadassa Brunschwig
>>> PhD Student
>>> Department of Statistics
>>
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>>
>
>
>
> -- 
> Hadassa Brunschwig
> PhD Student
> Department of Statistics
> The Hebrew University of Jerusalem
> http://www.stat.huji.ac.il

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list