[R] mutlidimensional in.convex.hull (was multidimensionalpoint.in.polygon??)
Keith Jewell
k.jewell at campden.co.uk
Fri Dec 11 13:02:07 CET 2009
>"baptiste auguie" <baptiste.auguie at googlemail.com> wrote in message
>news:de4e29f50912110200g7e43551kef5e8053fbf6ee15 at mail.gmail.com...
>2009/12/10 Charles C. Berry <cberry at tajo.ucsd.edu>:
>[snipped]
>> Many?
>>
>>
<snip> Charles' elegant coding of the algorithm he summarises as:
>> Any point that is in the convex hull of rbind(ps,xs) is either in or
>> outside
>> the convex hull of ps. Right? So, just recursively eliminate points that
>> are
>> in the convex hull of the larger set.
>>
Baptiste commented:
>
>If I'm not mistaken this method is efficient only because the two
>point distributions are very similar (drawn from rnorm, so they look
>like two concentric balls). If one of the convex hulls is very
>distorted along one axis, say, I believe the method will involve many
>more iterations and in the limit will require computing a convex hull
>for each test point as Duncan suggested.
>
>Such a pathological of test points example might be,
>
> xs <- matrix(0,ncol=4,nrow=100)
> xs[,1] <- seq(1,100)
>
>Or did I completely miss something? (quite possible)
>
>----
Until now I thought the same as Baptiste and (mea culpe) had rejected that
algorithm without testing it. Now I've tried it and it works!
Here's the result on my real test data; sorry it's long, but it shows some
important features:
a) ps is quite "pathological"?; some substantial correlations
b) xs <- expand.grid(lapply(ps, unique));
this is the reason I'm doing it in the first place. I want to
expand.grid without 'extrapolating' beyound the (convex hull of)
the original data
c) xs has lost the correlation structure of ps
d) 1170 'outside' points removed in 23 iterations
e) xs[-(outside)] has regained (some of) the correlation structure of ps
### <begin example>
> source(.trPaths[5], echo=TRUE, max.deparse.length=150)
> describe(ps)
ps
5 Variables 2637 Observations
-----------------------------------------------------------------------------------------------------------------------------
t
n missing unique Mean .05 .10 .25 .50 .75
.90 .95
2637 0 35 136.2 0 0 30 120 194
312 360
lowest : 0 2 3 4 24, highest: 312 336 360 384 504
-----------------------------------------------------------------------------------------------------------------------------
pH
n missing unique Mean
2637 0 4 5.707
4.6 (727, 28%), 5.4 (729, 28%), 6.2 (624, 24%), 7 (557, 21%)
-----------------------------------------------------------------------------------------------------------------------------
T
n missing unique Mean
2637 0 5 10.75
2 5 8 15 22
Frequency 447 510 593 537 550
% 17 19 22 20 21
-----------------------------------------------------------------------------------------------------------------------------
S
n missing unique Mean
2637 0 4 3.097
0 (631, 24%), 2 (648, 25%), 4 (638, 24%), 6 (720, 27%)
-----------------------------------------------------------------------------------------------------------------------------
N
n missing unique Mean
2637 0 4 118.6
0 (701, 27%), 80 (636, 24%), 160 (628, 24%), 240 (672, 25%)
-----------------------------------------------------------------------------------------------------------------------------
> cor(ps)
t pH T S N
t 1.000000000 0.02255541 -0.425911455 0.05541686 0.004447023
pH 0.022555414 1.00000000 -0.029277466 0.05630345 -0.031032641
T -0.425911455 -0.02927747 1.000000000 0.05948337 0.003595186
S 0.055416859 0.05630345 0.059483366 1.00000000 0.014074045
N 0.004447023 -0.03103264 0.003595186 0.01407404 1.000000000
> xs <- expand.grid(lapply(ps, unique))
> describe(xs)
xs
5 Variables 11200 Observations
-----------------------------------------------------------------------------------------------------------------------------
t
n missing unique Mean .05 .10 .25 .50 .75
.90 .95
11200 0 35 141.7 2 4 48 123 194
336 384
lowest : 0 2 3 4 24, highest: 312 336 360 384 504
-----------------------------------------------------------------------------------------------------------------------------
pH
n missing unique Mean
11200 0 4 5.8
4.6 (2800, 25%), 5.4 (2800, 25%), 6.2 (2800, 25%), 7 (2800, 25%)
-----------------------------------------------------------------------------------------------------------------------------
T
n missing unique Mean
11200 0 5 10.4
2 5 8 15 22
Frequency 2240 2240 2240 2240 2240
% 20 20 20 20 20
-----------------------------------------------------------------------------------------------------------------------------
S
n missing unique Mean
11200 0 4 3
0 (2800, 25%), 2 (2800, 25%), 4 (2800, 25%), 6 (2800, 25%)
-----------------------------------------------------------------------------------------------------------------------------
N
n missing unique Mean
11200 0 4 120
0 (2800, 25%), 80 (2800, 25%), 160 (2800, 25%), 240 (2800, 25%)
-----------------------------------------------------------------------------------------------------------------------------
> cor(xs)
t pH T S N
t 1.000000e+00 4.276263e-23 2.346103e-20 0 0.000000e+00
pH 4.276263e-23 1.000000e+00 -1.716171e-19 0 0.000000e+00
T 2.346103e-20 -1.716171e-19 1.000000e+00 0 -3.925415e-21
S 0.000000e+00 0.000000e+00 0.000000e+00 1 0.000000e+00
N 0.000000e+00 0.000000e+00 -3.925415e-21 0 1.000000e+00
> phull <- convhulln(ps)
> phull2 <- convhulln(rbind(ps,xs))
> nrp <- nrow(ps)
> nrx <- nrow(xs)
> outside <- unique(phull2[phull2>nrp])-nrp
> done <- FALSE
> print(length(outside))
[1] 20
> while(!done){
phull3 <- convhulln(rbind(ps,xs[-(outside),]))
also.outside <- (1:nrx)[-outside][unique(phull3[phull3>nrp])-nrp]
outside <- c(outside,also.outside)
print(length(outside))
done <- length(also.outside)==0
}
[1] 97
[1] 243
[1] 395
[1] 501
[1] 571
[1] 642
[1] 718
[1] 790
[1] 836
[1] 876
[1] 927
[1] 973
[1] 1012
[1] 1038
[1] 1060
[1] 1088
[1] 1120
[1] 1148
[1] 1162
[1] 1168
[1] 1170
[1] 1170
> describe(xs[-(outside),])
xs[-(outside), ]
5 Variables 10030 Observations
-----------------------------------------------------------------------------------------------------------------------------
t
n missing unique Mean .05 .10 .25 .50 .75
.90 .95
10030 0 35 120.5 2 4 30 120 172
240 336
lowest : 0 2 3 4 24, highest: 312 336 360 384 504
-----------------------------------------------------------------------------------------------------------------------------
pH
n missing unique Mean
10030 0 4 5.781
4.6 (2548, 25%), 5.4 (2559, 26%), 6.2 (2520, 25%), 7 (2403, 24%)
-----------------------------------------------------------------------------------------------------------------------------
T
n missing unique Mean
10030 0 5 9.584
2 5 8 15 22
Frequency 2099 2232 2174 2024 1501
% 21 22 22 20 15
-----------------------------------------------------------------------------------------------------------------------------
S
n missing unique Mean
10030 0 4 3.067
0 (2371, 24%), 2 (2512, 25%), 4 (2572, 26%), 6 (2575, 26%)
-----------------------------------------------------------------------------------------------------------------------------
N
n missing unique Mean
10030 0 4 120.0
0 (2502, 25%), 80 (2514, 25%), 160 (2515, 25%), 240 (2499, 25%)
-----------------------------------------------------------------------------------------------------------------------------
> cor(xs[-(outside),])
t pH T S N
t 1.0000000000 -0.0252190240 -0.1599673922 0.0260853161 -0.0001862074
pH -0.0252190240 1.0000000000 -0.0308956477 -0.0109761096 0.0006346428
T -0.1599673922 -0.0308956477 1.0000000000 0.0506125967 0.0005164211
S 0.0260853161 -0.0109761096 0.0506125967 1.0000000000 0.0006132354
N -0.0001862074 0.0006346428 0.0005164211 0.0006132354 1.0000000000
### <end example>
Baptiste said:
>
>Regarding the "inhull" Matlab code, I came to the opposite conclusion:
>it should be easily ported to R. 1) it is a very short piece of code
>(even more so if one disregards the various checks and handling of
>special cases), with no Matlab-specific objects (only integers,
>booleans, matrices and vectors). 2) The core of the program relies on
>the qhull library, and the same applies to R I think. 3) Matlab and R
>use very similar indexing for matrices and similar linear algebra in
>general.
>
>That said, I'm a bit short on time to give it a go myself. I think the
>open-source Octave could run this code too, so it might help in
>checking the code step-by-step.
>
>
>All the best,
>
>baptiste
I agree the Matlab code didn't look very complicated. The main reason I
thought it non-trival (for me) to code in R was my ignorance; lines like...
nullsp = null(xyz(tess(i,2:end),:) - repmat(xyz(tess(i,1),:),p-1,1))';
...seem important but I don't know what 'null' and 'repmat' do. I don't even
understand the algebra being implemented and I'm wary of coding in those
circumstances.
If I had time, I'd like to understand and code the Matlab algorithm, but
this is a small part of a much bigger task, on which I must progress. So
I'll thank Baptiste, Duncan and Charles for their invaluable help and move
on.
Best regards,
Keith Jewell
More information about the R-help
mailing list