[R] Sampling of non-overlapping intervals of variable length

Moshe Olshansky m_olshansky at yahoo.com
Mon Jul 20 07:23:45 CEST 2009


Another possibility, if the total length of your intervals is small in comparison to the "big interval" is to choose the starting points of all your intervals randomly and to dismiss the entire set if some of the intervals overlap. Most probably you will not have too many such cases (assuming, as stated above, that the total length of all the intervals is a small proportion of the length of the "big interval").

--- On Mon, 20/7/09, Hadassa Brunschwig <hadassa.brunschwig at mail.huji.ac.il> wrote:

> From: Hadassa Brunschwig <hadassa.brunschwig at mail.huji.ac.il>
> Subject: Re: [R] Sampling of non-overlapping intervals of variable length
> To: "Charles C. Berry" <cberry at tajo.ucsd.edu>
> Cc: r-help at r-project.org
> Received: Monday, 20 July, 2009, 3:08 PM
> Thanks Chuck.
> 
> Ups, did not think of the problem in that way.
> That did exactly what I needed.  I have another
> complication to this problem:
> I do not only have one vector of 1:1e^6 but several vectors
> of
> different length, say 5.
> Initially, my intervals are distributed over those 5
> vectors and the
> ranges of those
> 5 vectors in a specific way (and you might have guessed by
> now that I would like
> to do something like a permutation test). Because I have
> this
> additional level, I guess
> I could do something like:
> 
> 1)Sample the 5 vectors with probabilities proportional to
> the
> frequencies of the intial
> intervals on these vectors.
> 2)For each sampled vector: apply Chucks solution.
> 
> ?
> Thanks a lot.
> Hadassa
> 
> On Sun, Jul 19, 2009 at 11:13 PM, Charles C. Berry<cberry at tajo.ucsd.edu>
> wrote:
> > On Sun, 19 Jul 2009, Hadassa Brunschwig wrote:
> >
> >> Hi
> >>
> >> I am not sure what you mean by sampling an index
> of a group of
> >> intervals. I will try to give an example:
> >> Let's assume I have a vector 1:1000000. Let's say
> I have 10 intervals
> >> of different but known length, say,
> >> c(4,6,11,2,8,14,7,2,18,32). For simulation
> purposes I have to sample
> >> those 10 intervals 1000 times.
> >> The requirement is, however, that they should be
> of those lengths and
> >> should not be overlapping.
> >> In short, I would like to obtain a 10x1000 matrix
> with sampled intervals.
> >
> > Something like this:
> >
> >
> >> lens <- c(4,6,11,2,8,14,7,2,18,32)
> >> perm.lens <- sample(lens)
> >>
> >>
> sort(sample(1e06-sum(lens)+length(lens),length(lens)))+cumsum(c(0,head(perm.lens,-1)))
> >
> >  [1]  15424 261927 430276 445976 451069 546578
> 656123 890494 939714 969643
> >>
> >
> > The vector above gives the starting points for the
> intervals whose lengths
> > are perm.lens.
> >
> > I'd bet every introductory combinatorics book has a
> problem or example in
> > which the expression for the number of ways in which K
> ordered objects can
> > be assigned to I groups consisting of n_i adjacent
> objects each is
> > constructed. The construction is along the lines of
> the calculation above.
> >
> > HTH,
> >
> > Chuck
> >
> >
> >>
> >> Thanks
> >> Hadassa
> >>
> >> On Sun, Jul 19, 2009 at 9:48 PM, David
> Winsemius<dwinsemius at comcast.net>
> >> wrote:
> >>>
> >>> On Jul 19, 2009, at 1:05 PM, Hadassa
> Brunschwig wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I hope I am not repeating a question which
> has been posed already.
> >>>> I am trying to do the following in the
> most efficient way:
> >>>> I would like to sample from a finite
> (large) set of integers n
> >>>> non-overlapping
> >>>> intervals, where each interval i has a
> different, set length L_i
> >>>> (which is the number
> >>>> of integers in the interval).
> >>>> I had the idea to sample recursively on a
> vector with the already
> >>>> chosen intervals
> >>>> discarded but that seems to be too
> complicated.
> >>>
> >>> It might be ridiculously easy if you sampled
> on an index of a group of
> >>> intervals.
> >>> Why not pose the question in the form of
> example data.frames or other
> >>> classes of R objects? Specification of the
> desired output would be
> >>> essential. I think further specification of
> the sampling strategy would
> >>> also
> >>> help because I am unable to understand what
> sort of probability model you
> >>> are hoping to apply.
> >>>
> >>>> Any suggestions on that?
> >>>>
> >>>> Thanks a lot.
> >>>>
> >>>> Hadassa
> >>>>
> >>>>
> >>>> --
> >>>> Hadassa Brunschwig
> >>>> PhD Student
> >>>> Department of Statistics
> >>>
> >>>
> >>> David Winsemius, MD
> >>> Heritage Laboratories
> >>> West Hartford, CT
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Hadassa Brunschwig
> >> PhD Student
> >> Department of Statistics
> >> The Hebrew University of Jerusalem
> >> http://www.stat.huji.ac.il
> >>
> >> ______________________________________________
> >> R-help at r-project.org
> mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained,
> reproducible code.
> >>
> >
> > Charles C. Berry                        
>    (858) 534-2098
> >                                    
>        Dept of Family/Preventive
> > Medicine
> > E mailto:cberry at tajo.ucsd.edu
>               UC San Diego
> > http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla,
> San Diego 92093-0901
> >
> >
> >
> 
> 
> 
> -- 
> Hadassa Brunschwig
> PhD Student
> Department of Statistics
> The Hebrew University of Jerusalem
> http://www.stat.huji.ac.il
> 
> ______________________________________________
> R-help at r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>




More information about the R-help mailing list