[R-sig-phylo] Multiple regressions with continuous and categorical data
s.blomberg1 at uq.edu.au
Mon Apr 7 09:34:35 CEST 2008
On Sun, 2008-04-06 at 21:17 -0700, tgarland at ucr.edu wrote:
[ good stuff snipped ]
> With PGSL and phylogenetic regression transform models, you don't have
> comparable data points to examine, or at least that is not what people
> typically do. Rather, a bunch of matrices are slammed together (ouch
> - no pun intended!) and estimates of slopes and so forth are examined
> pretty much without reference to a plot of any type. Even residuals
> can be quite tricky (see Lavin et al., 2008 in press; see also Grafen,
> 1989). Tony Ives could jump in here!
Yes, it is bad practice not to plot your data, preferably in as many
ways as possible. Of course you are right: there is insight to be gained
from contrast plots, especially if you can identify outlier contrasts.
In R (using the gls function in the nlme package), you can plot
residuals v fitted values for 3 types of residuals: raw residuals,
standardized "pearson" residuals, and "normalized" standardized
residuals. You are right that you can be mislead by plots of the raw
residuals, although they will often indicate really obvious problems
with the model. The normalized residuals are obtained by pre-multiplying
the standardized residuals by the inverse of the cholesky decomposition
of the var-covar matrix. This "corrects" the residuals for their
(phylogenetic) correlation. People using gls should look at that plot
too, perhaps more than the raw residual plots. Also, a Q-Q plot of the
normalized residuals (but not necessarily the raw residuals) should show
normality. If it doesn't, you need to do more work. :-) If you fit the
gls model using lm.gls in the MASS package, you can use lm.influence to
get some leave-one-out diagnostics that may also be useful.
> All for now and cheers,
Simon Blomberg, BSc (Hons), PhD, MAppStat.
Lecturer and Consultant Statistician
Faculty of Biological and Chemical Sciences
The University of Queensland
St. Lucia Queensland 4072
Room 320 Goddard Building (8)
T: +61 7 3365 2506
1. I will NOT analyse your data for you.
2. Your deadline is your problem.
The combination of some data and an aching desire for
an answer does not ensure that a reasonable answer can
be extracted from a given body of data. - John Tukey.
More information about the R-sig-phylo