[R] mutlidimensional in.convex.hull (was multidimensionalpoint.in.polygon??)

Fri Dec 11 13:02:07 CET 2009

>"baptiste auguie" <baptiste.auguie at googlemail.com> wrote in message 
>news:de4e29f50912110200g7e43551kef5e8053fbf6ee15 at mail.gmail.com...
>2009/12/10 Charles C. Berry <cberry at tajo.ucsd.edu>:
>[snipped]
>> Many?
>>
>>
<snip> Charles' elegant coding of the algorithm he summarises as:
>> Any point that is in the convex hull of rbind(ps,xs) is either in or 
>> outside
>> the convex hull of ps. Right? So, just recursively eliminate points that 
>> are
>> in the convex hull of the larger set.
>>

Baptiste commented:
>
>If I'm not mistaken this method is efficient only because the two
>point distributions are very similar (drawn from rnorm, so they look
>like two concentric balls). If one of the convex hulls is very
>distorted along one axis, say, I believe the method will involve many
>more iterations and in the limit will require computing a convex hull
>for each test point as Duncan suggested.
>
>Such a pathological of test points example might be,
>
>    xs <- matrix(0,ncol=4,nrow=100)
>    xs[,1] <- seq(1,100)
>
>Or did I completely miss something? (quite possible)
>
>----

Until now I thought the same as Baptiste and (mea culpe) had rejected that 
algorithm without testing it. Now I've tried it and it works!

Here's the result on my real test data; sorry it's long, but it shows some 
important features:
 a) ps is quite "pathological"?; some substantial correlations
 b) xs <- expand.grid(lapply(ps, unique));
     this is the reason I'm doing it in the first place. I want to
    expand.grid without 'extrapolating' beyound the (convex hull of)
    the original data
 c) xs has lost the correlation structure of ps
 d) 1170 'outside' points removed in 23 iterations
 e) xs[-(outside)] has regained (some of) the correlation structure of ps

### <begin example>
> source(.trPaths[5], echo=TRUE, max.deparse.length=150)

> describe(ps)
ps
 5  Variables      2637  Observations
-----------------------------------------------------------------------------------------------------------------------------
t
      n missing  unique    Mean     .05     .10     .25     .50     .75 
.90     .95
   2637       0      35   136.2       0       0      30     120     194 
312     360
lowest :   0   2   3   4  24, highest: 312 336 360 384 504
-----------------------------------------------------------------------------------------------------------------------------
pH
      n missing  unique    Mean
   2637       0       4   5.707
4.6 (727, 28%), 5.4 (729, 28%), 6.2 (624, 24%), 7 (557, 21%)
-----------------------------------------------------------------------------------------------------------------------------
T
      n missing  unique    Mean
   2637       0       5   10.75
            2   5   8  15  22
Frequency 447 510 593 537 550
%          17  19  22  20  21
-----------------------------------------------------------------------------------------------------------------------------
S
      n missing  unique    Mean
   2637       0       4   3.097
0 (631, 24%), 2 (648, 25%), 4 (638, 24%), 6 (720, 27%)
-----------------------------------------------------------------------------------------------------------------------------
N
      n missing  unique    Mean
   2637       0       4   118.6
0 (701, 27%), 80 (636, 24%), 160 (628, 24%), 240 (672, 25%)
-----------------------------------------------------------------------------------------------------------------------------
> cor(ps)
              t          pH            T          S            N
t   1.000000000  0.02255541 -0.425911455 0.05541686  0.004447023
pH  0.022555414  1.00000000 -0.029277466 0.05630345 -0.031032641
T  -0.425911455 -0.02927747  1.000000000 0.05948337  0.003595186
S   0.055416859  0.05630345  0.059483366 1.00000000  0.014074045
N   0.004447023 -0.03103264  0.003595186 0.01407404  1.000000000
> xs <- expand.grid(lapply(ps, unique))
> describe(xs)
xs
 5  Variables      11200  Observations
-----------------------------------------------------------------------------------------------------------------------------
t
      n missing  unique    Mean     .05     .10     .25     .50     .75 
.90     .95
  11200       0      35   141.7       2       4      48     123     194 
336     384
lowest :   0   2   3   4  24, highest: 312 336 360 384 504
-----------------------------------------------------------------------------------------------------------------------------
pH
      n missing  unique    Mean
  11200       0       4     5.8
4.6 (2800, 25%), 5.4 (2800, 25%), 6.2 (2800, 25%), 7 (2800, 25%)
-----------------------------------------------------------------------------------------------------------------------------
T
      n missing  unique    Mean
  11200       0       5    10.4
             2    5    8   15   22
Frequency 2240 2240 2240 2240 2240
%           20   20   20   20   20
-----------------------------------------------------------------------------------------------------------------------------
S
      n missing  unique    Mean
  11200       0       4       3
0 (2800, 25%), 2 (2800, 25%), 4 (2800, 25%), 6 (2800, 25%)
-----------------------------------------------------------------------------------------------------------------------------
N
      n missing  unique    Mean
  11200       0       4     120
0 (2800, 25%), 80 (2800, 25%), 160 (2800, 25%), 240 (2800, 25%)
-----------------------------------------------------------------------------------------------------------------------------
> cor(xs)
              t            pH             T S             N
t  1.000000e+00  4.276263e-23  2.346103e-20 0  0.000000e+00
pH 4.276263e-23  1.000000e+00 -1.716171e-19 0  0.000000e+00
T  2.346103e-20 -1.716171e-19  1.000000e+00 0 -3.925415e-21
S  0.000000e+00  0.000000e+00  0.000000e+00 1  0.000000e+00
N  0.000000e+00  0.000000e+00 -3.925415e-21 0  1.000000e+00
> phull <-  convhulln(ps)
> phull2 <- convhulln(rbind(ps,xs))
> nrp <- nrow(ps)
> nrx <- nrow(xs)
> outside <- unique(phull2[phull2>nrp])-nrp
> done <- FALSE
> print(length(outside))
[1] 20
> while(!done){
    phull3 <- convhulln(rbind(ps,xs[-(outside),]))
    also.outside <- (1:nrx)[-outside][unique(phull3[phull3>nrp])-nrp]
    outside <- c(outside,also.outside)
    print(length(outside))
    done <- length(also.outside)==0
}
[1] 97
[1] 243
[1] 395
[1] 501
[1] 571
[1] 642
[1] 718
[1] 790
[1] 836
[1] 876
[1] 927
[1] 973
[1] 1012
[1] 1038
[1] 1060
[1] 1088
[1] 1120
[1] 1148
[1] 1162
[1] 1168
[1] 1170
[1] 1170
> describe(xs[-(outside),])
xs[-(outside), ]
 5  Variables      10030  Observations
-----------------------------------------------------------------------------------------------------------------------------
t
      n missing  unique    Mean     .05     .10     .25     .50     .75 
.90     .95
  10030       0      35   120.5       2       4      30     120     172 
240     336
lowest :   0   2   3   4  24, highest: 312 336 360 384 504
-----------------------------------------------------------------------------------------------------------------------------
pH
      n missing  unique    Mean
  10030       0       4   5.781
4.6 (2548, 25%), 5.4 (2559, 26%), 6.2 (2520, 25%), 7 (2403, 24%)
-----------------------------------------------------------------------------------------------------------------------------
T
      n missing  unique    Mean
  10030       0       5   9.584

             2    5    8   15   22
Frequency 2099 2232 2174 2024 1501
%           21   22   22   20   15
-----------------------------------------------------------------------------------------------------------------------------
S
      n missing  unique    Mean
  10030       0       4   3.067
0 (2371, 24%), 2 (2512, 25%), 4 (2572, 26%), 6 (2575, 26%)
-----------------------------------------------------------------------------------------------------------------------------
N
      n missing  unique    Mean
  10030       0       4   120.0
0 (2502, 25%), 80 (2514, 25%), 160 (2515, 25%), 240 (2499, 25%)
-----------------------------------------------------------------------------------------------------------------------------
> cor(xs[-(outside),])
               t            pH             T             S             N
t   1.0000000000 -0.0252190240 -0.1599673922  0.0260853161 -0.0001862074
pH -0.0252190240  1.0000000000 -0.0308956477 -0.0109761096  0.0006346428
T  -0.1599673922 -0.0308956477  1.0000000000  0.0506125967  0.0005164211
S   0.0260853161 -0.0109761096  0.0506125967  1.0000000000  0.0006132354
N  -0.0001862074  0.0006346428  0.0005164211  0.0006132354  1.0000000000
### <end example>

Baptiste said:
>
>Regarding the "inhull" Matlab code, I came to the opposite conclusion:
>it should be easily ported to R. 1) it is a very short piece of code
>(even more so if one disregards the various checks and handling of
>special cases), with no Matlab-specific objects (only integers,
>booleans, matrices and vectors). 2) The core of the program relies on
>the qhull library, and the same applies to R I think. 3) Matlab and R
>use very similar indexing for matrices and similar linear algebra in
>general.
>
>That said, I'm a bit short on time to give it a go myself. I think the
>open-source Octave could run this code too, so it might help in
>checking the code step-by-step.
>
>
>All the best,
>
>baptiste

I agree the Matlab code didn't look very complicated. The main reason I 
thought it non-trival (for me) to code in R was my ignorance; lines like...
nullsp = null(xyz(tess(i,2:end),:) - repmat(xyz(tess(i,1),:),p-1,1))';
...seem important but I don't know what 'null' and 'repmat' do. I don't even 
understand the algebra being implemented and I'm wary of coding in those 
circumstances.

If I had time, I'd like to understand and code the Matlab algorithm, but 
this is a small part of a much bigger task, on which I must progress. So 
I'll thank Baptiste, Duncan and Charles for their invaluable help and move 
on.

Best regards,

Keith Jewell