[R] ANCOVA when you don't know factor levels

Rob Knell r.knell at qmul.ac.uk
Wed Mar 17 12:53:39 CET 2004


Hello people

I am doing some thinking about how to analyse data on dimorphic animals  
- where different individuals of the same species have rather different  
morphology. An example of this is that some male beetles have large  
horns and small wings, and rely on beating the other guys up to get  
access to mates, whereas others have smaller horns and larger wings,  
and rely on mobility to find mates before the guys with the big horns  
turn up and beat them up. Normally what we do to see if this is  
happening is to plot a scatter plot of horn length vs body size. If  
it's a simple straight line relationship then no dimorphism, but if  
it's either a line with an obvious change of slope or two separate  
lines then you've got a dimorphic animal.

If you've just got a change of slope then it's relatively easy to test  
for significance by breakpoint regression, which will also tell you the  
body size at which the beetles switch from 'big wings, small horns'  
strategy to the other. What is more problematic, however, is if you  
have a dataset that is essentially two linear regressions that overlap  
in both X and Y. Here you can't do a breakpoint regression. The  
solution that I have come up with is to put a straight line between the  
two clouds of data and to use that line to define a two-level factor  
and fit a model with body size, the factor and the interaction to the  
horn size data. The line can then be moved up and down, and the slope  
varied, and the fit of the model for each intercept/slope combination  
compared. The best fit model can then be compared with a straight  
linear model with an F-test, which gives you a significance test, and  
the line allows you to decide which morph a particular beetle conforms  
to. I've written some R code to do this (see below). What i would like  
to know is a) if this problem has been dealt with before elsewhere, b)  
if there's a better way to do it and c) if my R function could be  
improved at all - it's my first attempt at such a thing, so any input  
would be really helpful. If anyone wants a sample data set then I can  
email you one.

Thanks for any input

Rob Knell

School of Biological Sciences
Queen Mary, University of London

'Phone +44 (0)20 7882 7720
http://www.qmw.ac.uk/~ugbt794
http://www.mopane.org

Here's the code:

switchline<- 
function(X,Y,Intmin,Intmax,Intgap,Slopemin,Slopemax,Slopegap)

 

{

 

#   X       = unimodal variable (e.g. filename$bodysize)

 

#   Y       = bimodal variable (e.g. filename$hornsize)

 

#    Intmin       = Minimum switchline intercept

 

#    Slopemin = Minimum switchline slope

#    Intmax       = Maximum switchline intercept

#    Slopemax = Maximum switchline slope

#    Intgap       = Interval between each intercept

#    Slopegap = Interval between each slope

 

 

 

#   This function written by Rob Knell 2003. It is an attempt to  
produce an

#    objective analysis for datasets of apparently dimorphic characters  
in which

#   there is overlap in both the X and Y variable. The routine divides  
the

#    individuals into one of two classes based on whether they are  
above or below

#   a line, and fits a linear model to the Y variable with the X  
variable as a

#    continuous explanatory variable and a single factor, "Morph",  
which is based

#   on whether they were above the line or below it, plus an  
interaction term.

#   The line is then adjusted, the individuals reclassified and the  
process

#    repeated. The output is of the form "Int" - intercept of the line  
dividing

#   one morph from another, "Slp" - the slope and "Rsqr", which gives  
the R-

#    squared value for the fitted model when the division into factors  
is based on

#   that particular line.

 

#   An error message probably means that one of your lines has missed  
the data

#    altogether - in other words, all of your data points are  
classified into a

#    single factor.

#

 

 

 

Lines.grid<-  
expand.grid(Intercept=seq(Intmin,Intmax,Intgap),Slope=seq(Slopemin,Slope 
max,Slopegap))

 

bandwidth<-length(Lines.grid$Intercept)

 

 

Lines<-matrix(nrow=bandwidth,ncol=3,data=NA)

dimnames(Lines)<-list(seq(1,bandwidth),c("Int","Slp","Rsqr"))

 

Lines[,1]<-Lines.grid$Intercept

Lines[,2]<-Lines.grid$Slope

 

for (i in 1:bandwidth)

 

    {

    temp.data<-matrix(nrow=length(X),ncol=4,data=NA)

    colnames(temp.data)<-c("X1","Y1","switchvalue","Morph")

    temp.data[,1]<-X

    temp.data[,2]<-Y

        for(j in 1:length(X))

        {

        temp.data[j,3]<-(Lines[i,1]+Lines[i,2]*temp.data[j,1])

        ifelse(temp.data[j,2]>=temp.data[j,3], temp.data[j,4]<-1,  
        temp.data[j,4]<-2)

        }

    

    temp.data<-data.frame(temp.data)

    temp.data$Morph<-factor(temp.data$Morph)

    model<-lm(temp.data$Y1~temp.data$X1*temp.data$Morph)

    Lines[i,3]<-round(summary(model)$r.squared,6)

    }

 

print(Lines)

 

Lines<-data.frame(Lines)

Max<-max(Lines$Rsqr)

MaxI<-Lines$Int[Lines$Rsqr==Max]

MaxS<-Lines$Slp[Lines$Rsqr==Max]

Maxmat<-cbind(MaxI,MaxS)

 

print(Maxmat)

}

                       




More information about the R-help mailing list