[R] ANCOVA when you don't know factor levels
Rob Knell
r.knell at qmul.ac.uk
Wed Mar 17 12:53:39 CET 2004
Hello people
I am doing some thinking about how to analyse data on dimorphic animals
- where different individuals of the same species have rather different
morphology. An example of this is that some male beetles have large
horns and small wings, and rely on beating the other guys up to get
access to mates, whereas others have smaller horns and larger wings,
and rely on mobility to find mates before the guys with the big horns
turn up and beat them up. Normally what we do to see if this is
happening is to plot a scatter plot of horn length vs body size. If
it's a simple straight line relationship then no dimorphism, but if
it's either a line with an obvious change of slope or two separate
lines then you've got a dimorphic animal.
If you've just got a change of slope then it's relatively easy to test
for significance by breakpoint regression, which will also tell you the
body size at which the beetles switch from 'big wings, small horns'
strategy to the other. What is more problematic, however, is if you
have a dataset that is essentially two linear regressions that overlap
in both X and Y. Here you can't do a breakpoint regression. The
solution that I have come up with is to put a straight line between the
two clouds of data and to use that line to define a two-level factor
and fit a model with body size, the factor and the interaction to the
horn size data. The line can then be moved up and down, and the slope
varied, and the fit of the model for each intercept/slope combination
compared. The best fit model can then be compared with a straight
linear model with an F-test, which gives you a significance test, and
the line allows you to decide which morph a particular beetle conforms
to. I've written some R code to do this (see below). What i would like
to know is a) if this problem has been dealt with before elsewhere, b)
if there's a better way to do it and c) if my R function could be
improved at all - it's my first attempt at such a thing, so any input
would be really helpful. If anyone wants a sample data set then I can
email you one.
Thanks for any input
Rob Knell
School of Biological Sciences
Queen Mary, University of London
'Phone +44 (0)20 7882 7720
http://www.qmw.ac.uk/~ugbt794
http://www.mopane.org
Here's the code:
switchline<-
function(X,Y,Intmin,Intmax,Intgap,Slopemin,Slopemax,Slopegap)
{
# X = unimodal variable (e.g. filename$bodysize)
# Y = bimodal variable (e.g. filename$hornsize)
# Intmin = Minimum switchline intercept
# Slopemin = Minimum switchline slope
# Intmax = Maximum switchline intercept
# Slopemax = Maximum switchline slope
# Intgap = Interval between each intercept
# Slopegap = Interval between each slope
# This function written by Rob Knell 2003. It is an attempt to
produce an
# objective analysis for datasets of apparently dimorphic characters
in which
# there is overlap in both the X and Y variable. The routine divides
the
# individuals into one of two classes based on whether they are
above or below
# a line, and fits a linear model to the Y variable with the X
variable as a
# continuous explanatory variable and a single factor, "Morph",
which is based
# on whether they were above the line or below it, plus an
interaction term.
# The line is then adjusted, the individuals reclassified and the
process
# repeated. The output is of the form "Int" - intercept of the line
dividing
# one morph from another, "Slp" - the slope and "Rsqr", which gives
the R-
# squared value for the fitted model when the division into factors
is based on
# that particular line.
# An error message probably means that one of your lines has missed
the data
# altogether - in other words, all of your data points are
classified into a
# single factor.
#
Lines.grid<-
expand.grid(Intercept=seq(Intmin,Intmax,Intgap),Slope=seq(Slopemin,Slope
max,Slopegap))
bandwidth<-length(Lines.grid$Intercept)
Lines<-matrix(nrow=bandwidth,ncol=3,data=NA)
dimnames(Lines)<-list(seq(1,bandwidth),c("Int","Slp","Rsqr"))
Lines[,1]<-Lines.grid$Intercept
Lines[,2]<-Lines.grid$Slope
for (i in 1:bandwidth)
{
temp.data<-matrix(nrow=length(X),ncol=4,data=NA)
colnames(temp.data)<-c("X1","Y1","switchvalue","Morph")
temp.data[,1]<-X
temp.data[,2]<-Y
for(j in 1:length(X))
{
temp.data[j,3]<-(Lines[i,1]+Lines[i,2]*temp.data[j,1])
ifelse(temp.data[j,2]>=temp.data[j,3], temp.data[j,4]<-1,
temp.data[j,4]<-2)
}
temp.data<-data.frame(temp.data)
temp.data$Morph<-factor(temp.data$Morph)
model<-lm(temp.data$Y1~temp.data$X1*temp.data$Morph)
Lines[i,3]<-round(summary(model)$r.squared,6)
}
print(Lines)
Lines<-data.frame(Lines)
Max<-max(Lines$Rsqr)
MaxI<-Lines$Int[Lines$Rsqr==Max]
MaxS<-Lines$Slp[Lines$Rsqr==Max]
Maxmat<-cbind(MaxI,MaxS)
print(Maxmat)
}
More information about the R-help
mailing list