[R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the deviance, at very large sample size?

Mon Mar 4 01:37:23 CET 2013

On 13-03-03 07:04 PM, Chris Howden wrote:
> Thanks for the responses everyone,
> 
> I agree that its changes of 'goodness of fit' Likelihood functions such as
> AIC and deviance that matter, not their absolute size.
> 
> However I think the impact of sample size may be something we need to
> consider, particularly when analysing "Big Data" sets.
> 
> I recently did some analysis on "Big Data", the number of rows was over
> 300 000. What I found was that the Full Model was always selected using
> AIC, Deviance and LRT. However when I had a look at the effects of the
> predictors I found some of them were negligible, to the point of not
> really being worth including in the model. Despite what the AIC and LRT
> say.

  Well, what do you mean by "not really worth including in the model"?
The AIC is telling you that they improve the expected predictive
accuracy.  "Too small to be interesting" is certainly possible, but it's
impossible for us to know (without the context of the question and
without knowing what question you're trying to answer with the model)
whether the effects are or aren't.

> 
> This seems to be the same sample size issue faced with simple Univariate
> tests such as ANOVA i.e. large sample sizes give so much power that
> statistically significant results may be of no/little practical value.
> 
> The reason I asked about the convergence of deviance and AIC at large
> sample sizes was thus.
> 
> The LRT tests between the Full model and 1 less predictor all had
> exceptionally small p-values, which meant that the difference in Ln(L) was
> very large. 

  (I would put this the other way around: deviance/log-likelihood
difference is more fundamental than p-value.)

> So large that it appears that the difference in deviance and
> AIC was essentially the same.

  Yes, it's true that for a fixed range of model sizes, model complexity
matters less and less for large samples.

> So although it’s the difference that matters, if they converge at large
> sample sizes then a large difference in deviance means there will also be
> a large difference in AIC and they will come to the same conclusion??
> 
> However as they don't converge at small sample sizes this effect is not as
> relevant.

  It's fairly well known, I think, that "everything is significant" for
sufficiently large sample sizes.  Arguably (e.g. according to Andrew
Gelman) we should be using hierarchical models to include more and more
structure in our models, so that we are always extracting as much
information as is in the data ...

  I'm not really clear on what your question is any more.  (And, these
are really general stats/modelling questions, not so much mixed modeling
questions ...)

  cheers
    Ben Bolker

> 
> Chris Howden B.Sc. (Hons) GStat.
> Founding Partner
> Evidence Based Strategic Development, IP Commercialisation and Innovation,
> Data Analysis, Modelling and Training
> (mobile) 0410 689 945
> (fax) +612 4782 9023
> chris at trickysolutions.com.au
> 
> 
> 
> 
> Disclaimer: The information in this email and any attachments to it are
> confidential and may contain legally privileged information. If you are
> not the named or intended recipient, please delete this communication and
> contact us immediately. Please note you are not authorised to copy, use or
> disclose this communication or any attachments without our consent.
> Although this email has been checked by anti-virus software, there is a
> risk that email messages may be corrupted or infected by viruses or other
> interferences. No responsibility is accepted for such interference. Unless
> expressly stated, the views of the writer are not those of the company.
> Tricky Solutions always does our best to provide accurate forecasts and
> analyses based on the data supplied, however it is possible that some
> important predictors were not included in the data sent to us. Information
> provided by us should not be solely relied upon when making decisions and
> clients should use their own judgement.
> 
> 
> -----Original Message-----
> From: Steve Taylor [mailto:steve.taylor at aut.ac.nz]
> Sent: Monday, 4 March 2013 10:10 AM
> To: Emmanuel Curis; Chris Howden
> Cc: r-sig-mixed-models
> Subject: RE: [R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the
> deviance, at very large sample size?
> 
> I agree that it is changes in AIC that matter, not its absolute value.
> 
> My understanding is that AIC is only useful for comparing two models
> fitted on the same data set, i.e. with the same sample size.  So the
> question of how AIC changes with sample size is of little use beyond
> curiosity.
> 
> The change in AIC caused by adding a term to the model formula would be of
> interest.  But the change in AIC caused by adding cases to the sample size
> is pretty meaningless.
> 
> The 2K part is important because it provides a penalty for the change in
> the number of parameters between a simpler model and a more complex model.
> 
> I would advise against making any approximations when calculating AIC,
> especially considering its main use is in taking the difference between
> two close large numbers.
> 
> cheers,
>     Steve
> 
> 
> -----Original Message-----
> From: r-sig-mixed-models-bounces at r-project.org
> [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Emmanuel
> Curis
> Sent: Friday, 1 March 2013 9:18p
> To: Chris Howden
> Cc: r-sig-mixed-models
> Subject: Re: [R-sig-ME] Can AIC be approximated by -2ln(L) i.e. the
> deviance, at very large sample size?
> 
> Hi,
> 
> I may be wrong, but I understood that AIC in itself is not as important as
> changes in AIC between models, and some authors says that changes in AIC
> in the order of more than 10 are enough to favor a model on another.
> 
> And changes in the 2*k term should be in this order of magnitude when
> comparing different models.
> 
> So my guess would be that it remains important.
> 
> On the other hand, if a set of parameters will remain in all models, it
> probably can be safely ignored in the 2*k term for all models.
> 
> Hope this helps,
> 
> On Fri, Mar 01, 2013 at 06:30:53PM +1100, Chris Howden wrote:
> < Hi everyone,
> <
> < Although not strictly an R issue there often seems to be discussions
> along < these lines on this list, so I hope no one minds me posting this.
> If U do < please let me know. (and just for the record I am applying this
> in R) < < I'm trying to get my head around AIC and sample size.
> <
> < Now if AIC = -2ln(L) + 2K = Deviance + 2K < < Am I right in thinking
> that as the Likelihood is the product of < probabilities then (all else
> being equal) the larger the sample size the < smaller the Likelihood?
> < Which means that if we have very large sample sizes we expect the
> -2ln(L) < term to be a very large number?
> < Which would reduce the effect of the parameter correction term 2K?
> <
> <
> < Chris Howden B.Sc. (Hons) GStat.
> < Founding Partner
> < Evidence Based Strategic Development, IP Commercialisation and
> Innovation, < Data Analysis, Modelling and Training < (mobile) 0410 689
> 945 < (fax) +612 4782 9023 < chris at trickysolutions.com.au < <
> _______________________________________________
> < R-sig-mixed-models at r-project.org mailing list <
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> --
>                                 Emmanuel CURIS
>                                 emmanuel.curis at parisdescartes.fr
> 
> Page WWW: http://emmanuel.curis.online.fr/index.html
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>