[R] cpquery problem

Thu Aug 4 10:37:50 CEST 2016

Hi Marco

Thank you very much for your helpful advice.

I have tried you suggestion of using method = 'lw' with cpquery and can
now obtain conditional probabilities.

However, I am still puzzled over the outputs from the predict() and
cpquery functions.

The network that I am working on has the following coefficients for the
node that I am interested in (ABW):

  Parameters of node ABW (conditional Gaussian distribution)

Conditional density: ABW | EST + TR + FFB + RF
Coefficients:
                        0             1             2
(Intercept)  -0.480612729  -5.834617332   0.809011487
TR       1.857271045   1.584331230   1.964198638
FFB    0.182533645   0.066891147   0.028620951
RF     -0.002822838   0.002155205  -0.001608243

Standard deviation of the residuals:
        0          1          2
1.5140402  1.1764351  0.9675918
Discrete parents' configurations:
     EST
0     K1
1     M1
2     M2

If I run predict() using this fitted network I get ABW results very close
to those expected.  For example, for test case 1, I get a predicted ABW of
15.022, which is very close to the actual ABW value for this case
(14.871).

However, running cpquery() using the values for this test case returns a
conditional probability of 0 for all levels of ABW observed in the
training data. For example, the conditional probability returned for a
event where ABW<15 is 0; similarly the conditional probability for an
event where ABW is between the minimum and maximum ABW values observed  in
the data is again zero; while the conditional probability of an ABW event
>24 (which is in excess of all observed values) is 1.

Why does cpquery not return a high conditional probability for an event
which is predicted from the same coefficients?

Many thanks for your assistance with these queries.

Regards

Ross

On Mon, August 1, 2016 7:35 pm, Marco Scutari wrote:
> Hi Ross,
>
>
> On 31 July 2016 at 09:11, Ross Chapman <ross.chapman at ecogeonomix.com>
> wrote:
>
>> I have tried running the cpquery in the debug mode, and found that it
>> typically returns the following for instances where the conditional
>> probability is returned as 0:
>>
>>> event matches 0 samples out of 0 (p = 0)
>>
>> Am I right in understanding that the Monte Carlo sampling has been
>> unable to create any cases that match the query?  If so, why would this
>> be if the evidence used is very typical of an average case in the data
>> used to train the network?
>
> Yes, that is what is happening. As to the reason why, I guess that the
> dependencies in the data may not be adequately represented in the fitted
> Bayesian network for some reason. What is apparent is that
> (EST=='y' & TR>9 & BU>15819 &  RF>2989) has an associated probability
> low enough that you do not observe any such sample in rejection sampling.
> Now, that being the case, you have two options:
>
>
> 1) use a much larger "n" with a smaller "batch = 10^6" to generate a
> lot more particles; 2) switch to likelihood weighting, i.e.
>
>
> cpquery(fitted,event=(ABW<=11), evidence=list(EST ='y', TR = c(9,
> max(data$TR)), BU = c(15819, max(data$BU)), RF = c(2989, max(data$RF)),
> n=10^6, method = "lw")
>
> 3) look at the parameters in your fitted network and diagnose why this
> is happening.
>
> Cheers,
> Marco
>
>
> --
> Marco Scutari, Ph.D.
> Lecturer in Statistics, Department of Statistics
> University of Oxford, United Kingdom
>
>