[R-sig-ME] Weighting - varPower

Mon Feb 28 01:21:18 CET 2011

Dear Bharat,

I think that the answer is probably the same for each of the
questions, at least in the broadest sense.  Your selection of the
varPower() function anticipates that the residual variation is a
function of the predictor variables.  Therefore, any summary
statistics that rely on the residual variance, for example, bias
correction, RMSE, and proportion of variation explained, may all
change as a function of whatever the predictor variables are that are
contributing to the varPower() function.  If the best-fitting model
includes such a weighting structure, then that is the right model to
use.  However, it means that you need to calculate the residual
variation at each value of x (or a representative set of values of x)
and compute the quantities you need conditional on those values.  For
example, in the case of m.0w below, each of the quantities will be a
smooth function of the fitted values.

I hope that this helps,

Andrew

ps please give my best wishes to Robert, Erin, and the boys. 

On Sun, Feb 27, 2011 at 06:17:22PM -0500, Bharat Pokharel wrote:
> Dear mixed-modellers;
> 
> I have couple of questions regarding mixed-model specification in R (nlme) and (lme4).
> 
> My data have hierarchical structure.  They come from standard forest inventory plots (i.e. all trees are measured in a fixed area circular plot of size 400 m2) and some plots are re-measured more than two cycles i.e. more than one measurement periods (e.g. 1999-2004, 2000-2005, 2001-2006, 2002-2007, 2003-2008, 2004-2009).  Most of the plots were measured only one measurement period, some plots that were established in 1999 were measured in 2004 and 2009 (three cycles or two measurement periods). I have nested measurement periods (MP) within Plot and trees are nested in the Plot.
> 
> My approach:
> a) First, I was debating whether should I use nonlinear or log-linearized model.  I chose to use log-linearized model as one of the predictor variable is dummy variable so that I can fit and test its (dummy variable) effects in the model. 
> b) Despite its slow convergence, I chose nlme package as it allows me to specify covariance structure (AR(1)) and specify weighting in order to stabilize non-constant variance if there exists any.
> 
> I have specified mixed-effects model as follow;
> 
> m.0 <- lme(dds.log ~ dbh.log+dbh+BAL+BA+SITES,
>             random=~1|PLot/MP, # measured periods (MP) are nested within Plots
>             data=tree, method="ML")
> m.AR1 <- update(m.0, correlation = corAR1()) 
> m.0d <-  update(m.AR1, weights = varPower(form = ~dbh))
> m.0w <-  update(m.AR1, weights = varPower())
> m.0w.REML <-  update(m.0w, method="REML")
> m.0w.NOSE <-  m.0w fitted without SITES covariate
> 
> where, response variable is log linearized individual tree increment, predictors are dbh and BAL (individual tree variables), whereas BA is plot level variable varies by MP and Plot, and SITES is categorical Plot level variable (assumed to be constant for next few decades).  MP is measured period e.g. 1999-2004 or 2004-2009 and Plot is plot ID. 
> 
> My questions are:
> a) residual variance after weighting: when I used weighted model (m.0w) - residual variance modelled as a power function of mean fitted value, then residual variance increases as expected (weight is power of negative delta), but variance due to plot and MP remained more or less constant.
>     (i) Since this is log-linearized model, I would like to apply bias correction factor as dds = exp(predicted_dds.log+(s^2(Plot)+S^2(MP)+S^2(res))/2).   Any effects from weighting to my log-linearized bias correction factor?  or S^2(residual variance changes as expected when applying weight), should I be using the residual variance after weighting to correct the log-linearized bias? Any suggestions?
> 
> > anova(m.0, m.AR1, m.0d, m.0w)
>       Model df      AIC      BIC    logLik   Test  L.Ratio p-value
> m.0       1 31 48602.89 48857.54 -24270.44                        
> m.AR1     2 32 48600.60 48863.47 -24268.30 1 vs 2  4.28633  0.0384
> m.0d      3 33 48569.17 48840.25 -24251.58 2 vs 3 33.43440  <.0001
> m.0w      4 33 46205.77 46476.85 -23069.88              
> 
> The data fits m.0w model better than other null models.  Finally, I chose m.0w and used REML to parameterize and estimate variance components.
> 
> >print(m.0w) # final model with varPower weighting as a function of mean fitted value
> Random effects:
>  Formula: ~1 | Plot
>         (Intercept)
> StdDev:   0.3974489
> 
>  Formula: ~1 | MP %in% Plot
>         (Intercept) Residual
> StdDev:   0.2081098 1.283800
> 
> Correlation Structure: AR(1)
>  Formula: ~1 | Plot/MP 
>  Parameter estimate(s):
>         Phi 
> 0.007990434 
> Variance function:
>  Structure: Power of variance covariate
>  Formula: ~fitted(.) 
>  Parameter estimates:
>      power 
> -0.8423614 
> Number of Observations: 27298
> Number of Groups: 
>         Plot MP %in% Plot 
>         1182         1210 
> 
> 
> >print(m.AR1) # model without weighting
> Random effects:
>  Formula: ~1 | Plot
>         (Intercept)
> StdDev:   0.3928492
> 
>  Formula: ~1 | MP %in% Plot
>         (Intercept)  Residual
> StdDev:   0.2008393 0.5616311
> 
> Correlation Structure: AR(1)
>  Formula: ~1 | Plot/MP 
>  Parameter estimate(s):
>        Phi 
> 0.01321357 
> Number of Observations: 27298
> Number of Groups: 
>         Plot MP %in% Plot 
>         1182         1210 
> 
> 
>    (ii) I am trying to calculate RMSE to estimate what proportion of variance in basal area increment is explained by Plot and/or MP?, should I be using variance estimated after weighting?
> 
> b) I would like to calculate what proportion of variation in basal area increment is explained by SITES?  I fit the model without SITES (m.0w.NOSE) and Likelihood ratio test suggests that SITES is highly significant predictor in the model. I still would like to quantify how much variation in basal area increment is explained by the SITES as a predictor variable (for this I calculated RMSE and compare model with and without SITES).  Mostly the model fitted with SITES reduces Plot level variance (as expected since this is plot level variable).  If I calculate the percent of RMSE reduced by SITES before and after weighting, they are different because only residual variance is increased due to weighting (negative delta value). Any suggestions?
> 
> > anova(m.0w, m0w.NOSE)
>             Model df      AIC      BIC    logLik   Test  L.Ratio p-value
> m.0w.NOSE     1 10 46532.47 46614.61 -23256.23                        
> m.0w            2 33 46205.77 46476.85 -23069.88 1 vs 2 372.7003  <.0001
> 
> > print(m.0w.NOSE)
> Random effects:
>  Formula: ~1 | Plot
>         (Intercept)
> StdDev:    0.502633
> 
>  Formula: ~1 | MP %in% Plot
>         (Intercept) Residual
> StdDev:   0.2096941 1.292326
> 
> Correlation Structure: AR(1)
>  Formula: ~1 | Plot/MP 
>  Parameter estimate(s):
>         Phi 
> 0.007753674 
> Variance function:
>  Structure: Power of variance covariate
>  Formula: ~fitted(.) 
>  Parameter estimates:
>      power 
> -0.8486992 
> Number of Observations: 27298
> Number of Groups: 
>         Plot MP %in% Plot 
>         1182         1210 
> 
> 
> Sorry for a long email. Any pointer you could provide me on these questons would be very very helpful. Thank you and look forward to hearing from you!
> 
> Regards,
> Bharat
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Andrew Robinson  
Program Manager, ACERA 
Department of Mathematics and Statistics            Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia               (prefer email)
http://www.ms.unimelb.edu.au/~andrewpr              Fax: +61-3-8344-4599
http://www.acera.unimelb.edu.au/

Forest Analytics with R (Springer, 2011) 
http://www.ms.unimelb.edu.au/FAwR/
Introduction to Scientific Programming and Simulation using R (CRC, 2009): 
http://www.ms.unimelb.edu.au/spuRs/