[R-sig-ME] Modeling precision and recall with GLMMs

Wed Mar 12 15:26:28 CET 2014

The "how good or bad" each method is, is what will come out of the method Jake is suggesting.

Using multilevel models for these is common in the memory recognition literature in psychology for the last decade or so, but is also relevant in lots of other areas like medical diagnostics. If the variable IS_ij is whether person i saw stimulus j (0 not seen, 1 seen), and SAY_ij is whether the person says she saw the stimuli, then a multilevel probit or logit regression, with careful coding of the variables, can mimic the standard SDT models. The critical variable for saying if people are accurate is the coefficient in front of SAY. If you have different conditions, COND_j, then interactions between COND_j (or COND_ij if varied within subject) and SAY_ij examine if accuracy varies among these. An important plus of the multilevel models is the coefficients can vary by person and/or stimuli. 

> Hi Ramon,

> I'm not sure that I fully understand the details of what you want to 
> accomplish. But I do want to ask: you jump right into your email 
> assuming that of course you want to model precision and recall, but 
> what about modelling the data directly (i.e., individual 
> classification
> decisions) rather than summaries of the data? Then you could work 
> backward (forward?) from the model results to compute what the implied 
> precision and recall would be

Sorry I did not provide enough details. I am comparing some methods for reconstructing networks, and the True positives and False positives, for instance, refer to the number of correctly inferred edges and to the number of edges that a procedure recovers that are not in the original network, respectively.

So the network reconstruction methods model the data directly, and what I want to model is how good or bad are what they return as a function of several other variables (related to several dimensions of the toughness of the problem, etc)

> If you decided that modelling the data directly would work for your 
> purposes, then one way of doing this would be to regress 
> classification decisions ("P" or "N") on actual classifications ("P" or "N").

I am not sure that would work. For each data set, each method returns a bunch of "P"s and "N"s. But what I want to do is model not the relationship between truth and prediction, but rather how good or bad each method is (at trying to reconstruct the truth).

> If this is done in a probit model, it is equivalent to the 
> equal-variance signal detection model studied at length in psychology, 
> with the intercept being the "criterion" in signal detection language 
> (denoted c), and the slope being "sensitivity" (denoted d' or 
> d-prime). It should definitely be possible to compute precision and 
> recall from c and d'.

I am not familiar with this approach in psychology. As I say above, I am not sure this addresses the problem I want to address but do you have some pointer to the literature where I can read more about the approach?

Best,

R.

> This might be simpler with a logit rather than probit link function.
>
> Let me know if I have misunderstood what you are trying to accomplish

> Jake

>> From: rdiaz02 at gmail.com.> To:
>> r-sig-mixed-models at r-project.org.> Date: Tue, 11 Mar 2014 11:48:57
>> +0100.> CC: ramon.diaz at iib.uam.es.> Subject: [R-sig-ME] Modeling
>> precision and recall with GLMMs.> .

>>  Dear All,. .

>>  I am examining the performance of a couple of classification-like  
>> methods. under different scenarios. Two of the metrics I am using are  
>> precision and. recall (TP/(TP + FP) and TP/(TP + FN), where TP, FP, 
>> and  FN are "true. positives", "false positives", and "false 
>> negatives" in a  simple two-way. confusion matrix). Some of the 
>> combinations of methods  have been used on. exactly the same data 
>> sets. So it is easy to set up a  binomial model (or. multinomial2 if using MCMCglmm) such as.

>> cbind(TP, FP) ~ fixed effects + (1|dataset)

>> However, the left hand side sounds questionable, specially with 
>> precision:. the expression TP/(TP + FP) has, in the denominator, a 
>> (TP +
>> FP) [the. number of results returned, or retrieved instances, etc] 
>> that, itself, can. be highly method-dependent (i.e., affected by the 
>> fixed effects). So rather. than a true proportion, this seems more 
>> like a ratio, where each of TP and. FP have their own variance, a 
>> covariance, etc, and thus the error. distribution is a mess (not the 
>> tidy thing of a binomial).

>> I've looked around in the literature and have not found much (maybe 
>> the. problem are my searching skills :-). Most people use rankings of 
>> methods,. not directly modeling precision or recall in the left-hand 
>> side of a. (generalized) linear model. A couple of papers use a 
>> linear model on the. log-transformed response (which I think is even 
>> worse than the above. binomial model, specially with lots of 0s or 
>> 1s). Some other people use a. single measure, such as the F-measure 
>> or Matthews correlation coefficient,. and I am using something 
>> similar too, but I specifically wanted to also. model precision and recall.. . .

>> An option would be a multi-response model with MCMCglmm, but I am not 
>> sure if this is appropriate either (dependence of the sum of FP and 
>> TP on the. fixed effects).. . .

>> Best,

> R-sig-mixed-models at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

--
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Autónoma de Madrid
Arzobispo Morcillo, 4
28029 Madrid
Spain

Phone: +34-91-497-2412

Email: rdiaz02 at gmail.com
       ramon.diaz at iib.uam.es

http://ligarto.org/rdiaz

_______________________________________________
R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models