[R] Survival::coxph (clogit), survConcordance vs. summary(fit) concordance

Thu Jan 21 16:29:04 CET 2016

Thanks Terry!

I thought that since I was providing survConcordance with the model object
that the same formula would be applied. But I was obviously wrong. I just
ran survConcordance with the addition of the strata argument, as you
suggested, and got the same answer as summary(fit)....with the same scary
SE.

This is a wildlife habitat selection analysis. Each individual animal has
habitat features that they used (1) and habitat that was available but that
they did not use (0). The habitat that is available is different for each
individual, hence the need for strata(ID of individual). However, all the
habitat data are collected from multiple discrete sites and each site has
multiple individuals on it. For all these analyses of these data, I've
assumed that individuals within a site may be more correlated than
individuals between sites, hence addition of cluster(site).

I was able recalculate the same concordance estimate as summary(fit) by
estimating predicted probabilities using:
risk <- predict(fit, type='risk')
risk / (1+risk)
And then used a probability cut-off of 0.5 for whether an observed point
was correctly classified, which returned the same 0.76 as the concordance
estimate.
So, can I just think of this concordance as a classification table (or
confusion matrix) with a 0.5 threshold (thus classification error would be
(1 - 0.76)?
Was I mistaken in thinking concordance was more akin to AUC in
unconditional logistic regression?

Thanks.
Joe

On Thu, Jan 21, 2016 at 8:01 AM, Therneau, Terry M., Ph.D. <
therneau at mayo.edu> wrote:

> I read the digest form which puts me behind, plus the last 2 days have
> been solid meetings with an external advisory group so I missed the initial
> query.   Three responses.
>
> 1. The clogit routine sets the data up properly and then calls a
> stratified Cox model.  If you want the survConcordance routine to give the
> same answer, it also needs to know about the strata
>     survConcordance (Surv(rep(1, 76L), resp) ~ predict(fit) + strata(ID),
> data=dat)
> I'm not surprised that you get a very different answer with/without strata.
>
> 2. I've never thought of using a robust variance for the matched
> case/control model.  I'm having a hard time wrapping my head around what
> you would expect that to accomplish (statistically).  Subjects are already
> matched on someone from the same site, so where does a per-site effect
> creep in?  Assuming there is a good reason and I just don't see it (not an
> unwarranted assumption), I'm not aware of any work on what an appropriate
> variance would be for the concordance in that case.
>
> 3. I need to think about the large variance issue.
>
> Terry Therneau
>
>
>
> On 01/20/2016 08:09 PM, r-help-request at r-project.org wrote:
>
>> Hi,
>>
>> I'm running conditional logistic regression with survival::clogit. I have
>> "1-1 case-control" data, i.e., there is 1 case and 1 control in each
>> strata.
>>
>> Model:
>> fit <- clogit(resp ~ x1 + x2, strata(ID), cluster(site), method ="efron",
>> data = dat)
>> Where resp is 1's and 0's, and x1 and x2 are both continuous.
>>
>> Predictors are both significant. A snippet of summary(fit):
>> Concordance= 0.763  (se = 0.5 )
>> Rsquare= 0.304   (max possible= 0.5 )
>> Likelihood ratio test= 27.54  on 2 df,   p=1.047e-06
>> Wald test            = 17.19  on 2 df,   p=0.0001853
>> Score (logrank) test = 17.43  on 2 df,   p=0.0001644,   Robust = 6.66
>>   p=0.03574
>>
>> The concordance estimate seems good but the SE is HUGE.
>>
>> I get a very different estimate from the survConcordance function, which I
>> know says computes concordance for a "single continuous covariate", but it
>> runs on my model with 2 continuous covariates....
>>
>> survConcordance(Surv(rep(1, 76L), resp) ~ predict(fit), dat)
>> n= 76
>> Concordance= 0.9106648 se= 0.09365047
>> concordant  discordant   tied.risk   tied.time    std(c-d)
>>   1315.0000   129.0000     0.0000   703.0000   270.4626
>>
>> Are both of these concordance estimates valid but providing different
>> information?
>> Is one more appropriate for measuring "performance" (in the AUC sense) of
>> conditional logistic models?
>> Is it possible that the HUGE SE estimate represents a convergence problem
>> (no warnings were thrown when fit the model), or is this model just
>> useless?
>>
>> Thanks!
>>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Cooperative Fish and Wildlife Research Unit
Zoology and Physiology Dept.
University of Wyoming
JoeCeradini at gmail.com / 914.707.8506
wyocoopunit.org

	[[alternative HTML version deleted]]