[R-sig-ME] Modeling precision and recall with GLMMs

Wed Mar 12 11:58:50 CET 2014

On Wed, 12-03-2014, at 05:04, jake987722 at hotmail.com wrote:

> A little detail I forgot to mention in my last email is that in the
> probit/SDT model, the intercept = criterion if the actual classifications
> predictor is contrast-coded (-1 vs. +1), not if dummy coded. The idea is
> basically that it estimates response bias...

Thanks for the clarification. 

R.

> Jake..

> From: jake987722 at hotmail.com.To: r-sig-mixed-models at r-project.org.Date:
> Tue, 11 Mar 2014 21:54:00 -0600.Subject: Re: [R-sig-ME] Modeling
> precision and recall with GLMMs..Hi Ramon,. .I'm not sure that I fully
> understand the details of what you want to accomplish. But I do want to
> ask: you jump right into your email assuming that of course you want to
> model precision and recall, but what about modelling the data directly
> (i.e., individual classification decisions) rather than summaries of the
> data? Then you could work backward (forward?) from the model results to
> compute what the implied precision and recall would be.. .If you decided
> that modelling the data directly would work for your purposes, then one
> way of doing this would be to regress classification decisions ("P" or
> "N") on actual classifications ("P" or "N"). If this is done in a probit
> model, it is equivalent to the equal-variance signal detection model
> studied at length in psychology, with the intercept being the "criterion"
> in signal detection language (denoted c), and the slope being
> "sensitivity" (denoted d' or d-prime). It should definitely be possible
> to compute precision and recall from c and d'. This might be simpler with
> a logit rather than probit link function.. .Let me know if I have
> misunderstood what you are trying to accomplish.. .Jake. .> From:
> rdiaz02 at gmail.com.> To: r-sig-mixed-models at r-project.org.> Date: Tue, 11
> Mar 2014 11:48:57 +0100.> CC: ramon.diaz at iib.uam.es.> Subject: [R-sig-ME]
> Modeling precision and recall with GLMMs.> .> Dear All,.> .> I am
> examining the performance of a couple of classification-like methods.>
> under different scenarios. Two of the metrics I am using are precision
> and.> recall (TP/(TP + FP) and TP/(TP + FN), where TP, FP, and FN are
> "true.> positives", "false positives", and "false negatives" in a simple
> two-way.> confusion matrix). Some of the combinations of methods have
> been used on.> exactly the same data sets. So it is easy to set up a
> binomial model (or.> multinomial2 if using MCMCglmm) such as.> .> .>
> cbind(TP, FP) ~ fixed effects + (1|dataset) .> .> .> .> However, the left
> hand side sounds questionable, specially with precision:.> the expression
> TP/(TP + FP) has, in the denominator, a (TP + FP) [the.> number of
> results returned, or retrieved instances, etc] that, itself, can.> be
> highly method-dependent (i.e., affected by the fixed effects). So
> rather.> than a true proportion, this seems more like a ratio, where each
> of TP and.> FP have their own variance, a covariance, etc, and thus the
> error.> distribution is a mess (not the tidy thing of a binomial)..> .>
> .> I've looked around in the literature and have not found much (maybe
> the.> problem are my searching skills :-). Most people use rankings of
> methods,.> not directly modeling precision or recall in the left-hand
> side of a.> (generalized) linear model. A couple of papers use a linear
> model on the.> log-transformed response (which I think is even worse than
> the above.> binomial model, specially with lots of 0s or 1s). Some other
> people use a.> single measure, such as the F-measure or Matthews
> correlation coefficient,.> and I am using something similar too, but I
> specifically wanted to also.> model precision and recall..> .> .> An
> option would be a multi-response model with MCMCglmm, but I am not sure.>
> if this is appropriate either (dependence of the sum of FP and TP on
> the.> fixed effects)..> .> .> Best,.> .> .> R..> .> -- .> Ramon
> Diaz-Uriarte.> Department of Biochemistry, Lab B-25.> Facultad de
> Medicina .> Universidad Aut.noma de Madrid .> Arzobispo Morcillo, 4.>
> 28029 Madrid.> Spain.> .> Phone: +34-91-497-2412.> .> Email:
> rdiaz02 at gmail.com.> ramon.diaz at iib.uam.es.> .> http://ligarto.org/rdiaz.>
> .> _______________________________________________.>
> R-sig-mixed-models at r-project.org mailing list.>
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models. .. .  ..
> ..[[alternative HTML version
> deleted]]. .._______________________________________________.R-sig-mixed-models at r-project.org
> mailing list.https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> .. .  ..  ..[[alternative HTML version
> deleted]].._______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina 
Universidad Autónoma de Madrid 
Arzobispo Morcillo, 4
28029 Madrid
Spain

Phone: +34-91-497-2412

Email: rdiaz02 at gmail.com
       ramon.diaz at iib.uam.es

http://ligarto.org/rdiaz