[R-sig-ME] Modeling precision and recall with GLMMs
Ramon Diaz-Uriarte
rdiaz02 at gmail.com
Tue Mar 11 11:48:57 CET 2014
Dear All,
I am examining the performance of a couple of classification-like methods
under different scenarios. Two of the metrics I am using are precision and
recall (TP/(TP + FP) and TP/(TP + FN), where TP, FP, and FN are "true
positives", "false positives", and "false negatives" in a simple two-way
confusion matrix). Some of the combinations of methods have been used on
exactly the same data sets. So it is easy to set up a binomial model (or
multinomial2 if using MCMCglmm) such as
cbind(TP, FP) ~ fixed effects + (1|dataset)
However, the left hand side sounds questionable, specially with precision:
the expression TP/(TP + FP) has, in the denominator, a (TP + FP) [the
number of results returned, or retrieved instances, etc] that, itself, can
be highly method-dependent (i.e., affected by the fixed effects). So rather
than a true proportion, this seems more like a ratio, where each of TP and
FP have their own variance, a covariance, etc, and thus the error
distribution is a mess (not the tidy thing of a binomial).
I've looked around in the literature and have not found much (maybe the
problem are my searching skills :-). Most people use rankings of methods,
not directly modeling precision or recall in the left-hand side of a
(generalized) linear model. A couple of papers use a linear model on the
log-transformed response (which I think is even worse than the above
binomial model, specially with lots of 0s or 1s). Some other people use a
single measure, such as the F-measure or Matthews correlation coefficient,
and I am using something similar too, but I specifically wanted to also
model precision and recall.
An option would be a multi-response model with MCMCglmm, but I am not sure
if this is appropriate either (dependence of the sum of FP and TP on the
fixed effects).
Best,
R.
--
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Autónoma de Madrid
Arzobispo Morcillo, 4
28029 Madrid
Spain
Phone: +34-91-497-2412
Email: rdiaz02 at gmail.com
ramon.diaz at iib.uam.es
http://ligarto.org/rdiaz
More information about the R-sig-mixed-models
mailing list