[R] understanding the output from gls

Thu Sep 3 01:18:25 CEST 2009

On Wed, Sep 2, 2009 at 2:39 PM, <Timothy_Handley at nps.gov> wrote:
> Kingsford,
>
> Thanks for the information. As you suggest, if I don't hear from anyone
> else about the degrees of freedom issue in a couple days, I'll try r-devel.

Determining denominator degrees of freedom for F tests in mixed models
is an area of current research (and some contention) and it may be
that the same holds true for df in a linear model w/ estimated
parameters structuring the error covariance.  So, these may be some
deep waters that nobody is eager to wade into....or maybe there's a
simple explanation...

>
> Also, while I appreciate your explanation of the correlation matrix
> produced by summary.gls, I'm afriad I don't have the statistical background
> to fully understand it. If it wouldn't impose too much on your time, could
> you suggest a more intuitive way to interpret these numbers, or perhaps a
> reference with a more intuitive explanation?
>
> To me, the idea that fixed effects estimates could be correlated just
> sounds odd. I understand that the effects themselves could be correlated.
> While I have good biological reasons for thinking that my two fixed effects
> are independent, the data may prove me wrong. However, the fixed effects
> estimates are each just a single number for the data/model as a whole. How
> can one pair of two single numbers be correlated?
>

Well the issue is that you're assuming a model.  Here's a simple idea
that may help you intuition:

Visualize a scatterplot and place a point on that plot that marks
(mean x, mean y).  The way the math works out, a regression line fit
to the data must go through that point. Now think about adjusting the
intercept of that line, while keeping it going through the (mean x,
mean y) point -- clearly the slope will change too.  So, x and y are
not independent.

The reason the covariance matrix of the fixed effects coefficients is
usually of interest is for inference.  For example a joint confidence
region for the slope and intercept would be an ellipse with elongation
determined by the covariance.

hth,

Kingsford

> Tim Handley
> Fire Effects Monitor
> Santa Monica Mountains National Recreation Area
> 401 W. Hillcrest Dr.
> Thousand Oaks, CA 91360
> 805-370-2347
>
>
>
>
>
>             Kingsford Jones
>             <kingsfordjones at g
>             mail.com>                                                  To
>                                       Timothy_Handley at nps.gov
>             09/01/2009 05:53                                           cc
>             PM                        R-help at r-project.org
>                                                                   Subject
>                                       Re: [R] understanding the output
>                                       from gls
>
>
>
>
>
>
>
>
>
>
> Hi Tim,
>
> On Tue, Sep 1, 2009 at 2:00 PM, <Timothy_Handley at nps.gov> wrote:
>>
>> I'd like to compare two models which were fitted using gls, however I'm
>> having trouble interpreting the results of gls. If any of you could offer
>> me some advice, I'd greatly appreciate it.
>>
>> Short explanation of models: These two models have the same fixed-effects
>> structure (two independent, linear effects),
>
> Known to be independent?
>
>> and differ only in that the
>> second model includes a corExp structure for spatial autocorrelation.
> (more
>> detailed explanation of the models at end).
>>
>> Specific questions:
>>
>> 1. The second model estimates two additional parameters in the process of
>> fitting the corSpatial object - the range and nugget of the spatial
>> autocorrelation. Based on this, I would expect the second model to have
> two
>> fewer residual degrees of freedom. However, the summary function reports
>> that both models have the same number of residual degrees of
> freedom.  Why
>> is this? (Interestingly, the difference in AIC between the two models
>> reflects this difference in the number of model parameters)
>>
>
> That's a good question.  It does seem degrees of freedom would be
> spent in estimating the spatial parameters.
>
> The reason the functions using logLik get the number of parameters
> "right" is because logLik.gls includes:
>
> attr(val, "df") <- p + length(coef(object[["modelStruct"]])) + 1
>
> where modelStruct holds the estimated parameters that structure
> residual variance and correlation, and I believe p is the number of
> columns in the design matrix, but I couldn't easily follow the code
> within gls that produces it.
>
> If nobody offers an explanation for the current result from
> print.summary.gls you could ask on r-devel if the results are
> intended.
>
>> 2. In the model summary, what is the meaning of the small correlation
>> matrix under the heading "Correlation:"? At first, I thought that this
> was
>> describing possible correlations among the predictor variables, but then
> I
>> saw that it also included the model intercept. What do these correlation
>> value mean?
>
> The GLS covariance of estimated fixed effects (including the
> intercept) is (X' \hat{\Sigma}^-1 X)^-1, where X is the model matrix
> and \Sigma is the response/error covariance matrix.  This will
> generally have non-zero off-diagonals, implying the fixed effects
> estimates are correlated.  The values you're inquiring about are the
> scaled estimates of those off-diagonals.
>
> hth,
>
> Kingsford Jones
>
>>
>> ##More detailed information
>> ##function calls:
>>  sppl.i.xx = gls(all.all.rch~l10area+newx, data = gtemp, method="ML")
>>  sppl.i.ex = gls(all.all.rch~l10area+newx, data = gtemp, method="ML",
>>              correlation = corExp(c(20,.8), form=~x+y|area, nugget=TRUE))
>>
>> ##model summaries
>>
>>> summary(sppl.i.xx)
>> Generalized least squares fit by maximum likelihood
>>  Model: all.all.rch ~ l10area + newx
>>  Data: gtemp
>>       AIC     BIC    logLik
>>  567.4893 578.572 -279.7446
>>
>> Coefficients:
>>               Value Std.Error   t-value p-value
>> (Intercept) 6.891867 0.3295097 20.915522   0e+00
>> l10area     6.586182 0.3048870 21.602046   0e+00
>> newx        0.047901 0.0117281  4.084307   1e-04
>>
>>  Correlation:
>>        (Intr) l10are
>> l10area -0.364
>> newx     0.577 -0.007
>>
>> Standardized residuals:
>>        Min          Q1         Med          Q3         Max
>> -3.34307266 -0.57949890 -0.07214605  0.64309760  2.66409931
>>
>> Residual standard error: 2.590313
>> Degrees of freedom: 118 total; 115 residual
>>
>> summary(sppl.i.ex)
>> Generalized least squares fit by maximum likelihood
>>  Model: all.all.rch ~ l10area + newx
>>  Data: gtemp
>>      AIC      BIC    logLik
>>  559.167 575.7911 -273.5835
>>
>> Correlation Structure: Exponential spatial correlation
>>  Formula: ~x + y | area
>>  Parameter estimate(s):
>>     range     nugget
>> 15.4448835  0.3741476
>>
>> Coefficients:
>>               Value Std.Error   t-value p-value
>> (Intercept) 7.621306 0.7648135  9.964921  0.0000
>> l10area     6.400442 0.5588160 11.453576  0.0000
>> newx        0.066535 0.0204417  3.254857  0.0015
>>
>>  Correlation:
>>        (Intr) l10are
>> l10area -0.592
>> newx     0.358  0.014
>>
>> Standardized residuals:
>>       Min         Q1        Med         Q3        Max
>> -3.0035983 -0.5990432 -0.2226852  0.5113270  2.4444263
>>
>> Residual standard error: 2.820337
>> Degrees of freedom: 118 total; 115 residual
>>
>>
>>
>>
>> Tim Handley
>> Fire Effects Monitor
>> Santa Monica Mountains National Recreation Area
>> 401 W. Hillcrest Dr.
>> Thousand Oaks, CA 91360
>> 805-370-2347
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>