[R] understanding the output from gls

Wed Sep 2 22:39:20 CEST 2009

Kingsford,

Thanks for the information. As you suggest, if I don't hear from anyone
else about the degrees of freedom issue in a couple days, I'll try r-devel.

Also, while I appreciate your explanation of the correlation matrix
produced by summary.gls, I'm afriad I don't have the statistical background
to fully understand it. If it wouldn't impose too much on your time, could
you suggest a more intuitive way to interpret these numbers, or perhaps a
reference with a more intuitive explanation?

To me, the idea that fixed effects estimates could be correlated just
sounds odd. I understand that the effects themselves could be correlated.
While I have good biological reasons for thinking that my two fixed effects
are independent, the data may prove me wrong. However, the fixed effects
estimates are each just a single number for the data/model as a whole. How
can one pair of two single numbers be correlated?

Tim Handley
Fire Effects Monitor
Santa Monica Mountains National Recreation Area
401 W. Hillcrest Dr.
Thousand Oaks, CA 91360
805-370-2347

             Kingsford Jones                                               
             <kingsfordjones at g                                             
             mail.com>                                                  To 
                                       Timothy_Handley at nps.gov             
             09/01/2009 05:53                                           cc 
             PM                        R-help at r-project.org                
                                                                   Subject 
                                       Re: [R] understanding the output    
                                       from gls                            

Hi Tim,

On Tue, Sep 1, 2009 at 2:00 PM, <Timothy_Handley at nps.gov> wrote:
>
> I'd like to compare two models which were fitted using gls, however I'm
> having trouble interpreting the results of gls. If any of you could offer
> me some advice, I'd greatly appreciate it.
>
> Short explanation of models: These two models have the same fixed-effects
> structure (two independent, linear effects),

Known to be independent?

> and differ only in that the
> second model includes a corExp structure for spatial autocorrelation.
(more
> detailed explanation of the models at end).
>
> Specific questions:
>
> 1. The second model estimates two additional parameters in the process of
> fitting the corSpatial object - the range and nugget of the spatial
> autocorrelation. Based on this, I would expect the second model to have
two
> fewer residual degrees of freedom. However, the summary function reports
> that both models have the same number of residual degrees of
freedom.  Why
> is this? (Interestingly, the difference in AIC between the two models
> reflects this difference in the number of model parameters)
>

That's a good question.  It does seem degrees of freedom would be
spent in estimating the spatial parameters.

The reason the functions using logLik get the number of parameters
"right" is because logLik.gls includes:

attr(val, "df") <- p + length(coef(object[["modelStruct"]])) + 1

where modelStruct holds the estimated parameters that structure
residual variance and correlation, and I believe p is the number of
columns in the design matrix, but I couldn't easily follow the code
within gls that produces it.

If nobody offers an explanation for the current result from
print.summary.gls you could ask on r-devel if the results are
intended.

> 2. In the model summary, what is the meaning of the small correlation
> matrix under the heading "Correlation:"? At first, I thought that this
was
> describing possible correlations among the predictor variables, but then
I
> saw that it also included the model intercept. What do these correlation
> value mean?

The GLS covariance of estimated fixed effects (including the
intercept) is (X' \hat{\Sigma}^-1 X)^-1, where X is the model matrix
and \Sigma is the response/error covariance matrix.  This will
generally have non-zero off-diagonals, implying the fixed effects
estimates are correlated.  The values you're inquiring about are the
scaled estimates of those off-diagonals.

hth,

Kingsford Jones

>
> ##More detailed information
> ##function calls:
>  sppl.i.xx = gls(all.all.rch~l10area+newx, data = gtemp, method="ML")
>  sppl.i.ex = gls(all.all.rch~l10area+newx, data = gtemp, method="ML",
>              correlation = corExp(c(20,.8), form=~x+y|area, nugget=TRUE))
>
> ##model summaries
>
>> summary(sppl.i.xx)
> Generalized least squares fit by maximum likelihood
>  Model: all.all.rch ~ l10area + newx
>  Data: gtemp
>       AIC     BIC    logLik
>  567.4893 578.572 -279.7446
>
> Coefficients:
>               Value Std.Error   t-value p-value
> (Intercept) 6.891867 0.3295097 20.915522   0e+00
> l10area     6.586182 0.3048870 21.602046   0e+00
> newx        0.047901 0.0117281  4.084307   1e-04
>
>  Correlation:
>        (Intr) l10are
> l10area -0.364
> newx     0.577 -0.007
>
> Standardized residuals:
>        Min          Q1         Med          Q3         Max
> -3.34307266 -0.57949890 -0.07214605  0.64309760  2.66409931
>
> Residual standard error: 2.590313
> Degrees of freedom: 118 total; 115 residual
>
> summary(sppl.i.ex)
> Generalized least squares fit by maximum likelihood
>  Model: all.all.rch ~ l10area + newx
>  Data: gtemp
>      AIC      BIC    logLik
>  559.167 575.7911 -273.5835
>
> Correlation Structure: Exponential spatial correlation
>  Formula: ~x + y | area
>  Parameter estimate(s):
>     range     nugget
> 15.4448835  0.3741476
>
> Coefficients:
>               Value Std.Error   t-value p-value
> (Intercept) 7.621306 0.7648135  9.964921  0.0000
> l10area     6.400442 0.5588160 11.453576  0.0000
> newx        0.066535 0.0204417  3.254857  0.0015
>
>  Correlation:
>        (Intr) l10are
> l10area -0.592
> newx     0.358  0.014
>
> Standardized residuals:
>       Min         Q1        Med         Q3        Max
> -3.0035983 -0.5990432 -0.2226852  0.5113270  2.4444263
>
> Residual standard error: 2.820337
> Degrees of freedom: 118 total; 115 residual
>
>
>
>
> Tim Handley
> Fire Effects Monitor
> Santa Monica Mountains National Recreation Area
> 401 W. Hillcrest Dr.
> Thousand Oaks, CA 91360
> 805-370-2347
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>