[R] Random sampling while keeping distribution of nearest neighbor distances constant.

Thu Aug 13 01:48:22 CEST 2009

Dear Daniel,

Thank a lot for your suggestion. It is helpful and got me thinking
more about it so that I can rephrase it:

Given a vector V containing X values, comprised within 1 and N. I'd
like to sample values so that the *distribution* of distances between
the X values is similar.

There are several distributions: the 1st order would be given by the
function diff.
The 2d order distribution would be given by
diff(V[seq(1,length(V),by=2)]) and diff(V[seq(2,length(V),by=2)])
The 3rd order distribution diff(V[seq(1,length(V),by=3)]) and
diff(V[seq(2,length(V),by=3)]) and diff(V[seq(3,length(V),by=3)])
The 4th order ....

I would like to produce different samples, where the first, or first
and second, or first and second and third, or up to say five orders
distance distributions are reproduced.

Is anybody aware of a formalism that is explained in a book and that
could help me deal with this problem? Or even better of a package?

Thanks for your help,

Emmanuel

2009/8/12 Nordlund, Dan (DSHS/RDA) <NordlDJ at dshs.wa.gov>:
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
>> Behalf Of Emmanuel Levy
>> Sent: Wednesday, August 12, 2009 3:05 PM
>> To: r-help at stat.math.ethz.ch
>> Cc: dev djomson
>> Subject: [R] Random sampling while keeping distribution of nearest neighbor
>> distances constant.
>>
>> Dear All,
>>
>> I cannot find a solution to the following problem although I imagine
>> that it is a classic, hence my email.
>>
>> I have a vector V of X values comprised between 1 and N.
>>
>> I would like to get random samples of X values also comprised between
>> 1 and N, but the important point is:
>> * I would like to keep the same distribution of distances between the X values *
>>
>> For example let's say N=10 and I have V = c(3,4,5,6)
>> then the random values could be 1,2,3,4 or 2,3,4,5 or 3,4,5,6, or 4,5,6,7 etc..
>> so that the distribution of distances (3 <-> 4, 3 <->5, 3 <-> 6, 4 <->
>> 5, 4 <-> 6 etc ...) is kept constant.
>>
>> I couldn't find a package that help me with this, but it looks like it
>> should be a classic problem so there should be something!
>>
>> Many thanks in advance for any help or hint you could provide,
>>
>> All the best,
>>
>> Emmanuel
>>
>
> Emmanuel,
>
> I don't know if this is a classic problem or not.  But given your description, you write your own function something like this
>
> sample.dist <- function(vec, Min=1, Max=10){
>  diffs <- c(0,diff(vec))
>  sum_d <- sum(diffs)
>  sample(Min:(Max-sum_d),1)+cumsum(diffs)
>  }
>
> Where Min and Max are the minimum and maximum values that you are sampling from (Min=1 and Max=10 in your example), and vec is passed the vector that you are sampling distances from.  This assumes that your vector is sorted smallest to largest as in your example.   The function could be changed to accommodate a vector that isn't sorted.
>
>> V <- sort(sample(1:100,4))
>> V
> #[1] 46 78 82 95
>> sample.dist(V, Min=1, Max=100)
> #[1] 36 68 72 85
>> sample.dist(V, Min=1, Max=100)
> #[1] 12 44 48 61
>>
> This should get you started at least.  Hope this is helpful,
>
> Dan
>
> Daniel J. Nordlund
> Washington State Department of Social and Health Services
> Planning, Performance, and Accountability
> Research and Data Analysis Division
> Olympia, WA  98504-5204
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>