[R] ANCOVA when you don't know factor levels
Liaw, Andy
andy_liaw at merck.com
Wed Mar 17 14:06:11 CET 2004
This sounds like a good case for mixture regression, for which there's Fritz
Leisch's `flexmix' package. However, I don't think flexmix has facility for
testing whether the mixture has one vs. two components. Others on the list
surely would know more than I do.
HTH,
Andy
> From: Rob Knell
>
> Hello people
>
> I am doing some thinking about how to analyse data on
> dimorphic animals
> - where different individuals of the same species have rather
> different
> morphology. An example of this is that some male beetles have large
> horns and small wings, and rely on beating the other guys up to get
> access to mates, whereas others have smaller horns and larger wings,
> and rely on mobility to find mates before the guys with the
> big horns
> turn up and beat them up. Normally what we do to see if this is
> happening is to plot a scatter plot of horn length vs body size. If
> it's a simple straight line relationship then no dimorphism, but if
> it's either a line with an obvious change of slope or two separate
> lines then you've got a dimorphic animal.
>
> If you've just got a change of slope then it's relatively
> easy to test
> for significance by breakpoint regression, which will also
> tell you the
> body size at which the beetles switch from 'big wings, small horns'
> strategy to the other. What is more problematic, however, is if you
> have a dataset that is essentially two linear regressions
> that overlap
> in both X and Y. Here you can't do a breakpoint regression. The
> solution that I have come up with is to put a straight line
> between the
> two clouds of data and to use that line to define a two-level factor
> and fit a model with body size, the factor and the
> interaction to the
> horn size data. The line can then be moved up and down, and
> the slope
> varied, and the fit of the model for each intercept/slope
> combination
> compared. The best fit model can then be compared with a straight
> linear model with an F-test, which gives you a significance
> test, and
> the line allows you to decide which morph a particular beetle
> conforms
> to. I've written some R code to do this (see below). What i
> would like
> to know is a) if this problem has been dealt with before
> elsewhere, b)
> if there's a better way to do it and c) if my R function could be
> improved at all - it's my first attempt at such a thing, so
> any input
> would be really helpful. If anyone wants a sample data set
> then I can
> email you one.
>
> Thanks for any input
>
> Rob Knell
>
> School of Biological Sciences
> Queen Mary, University of London
>
> 'Phone +44 (0)20 7882 7720
> http://www.qmw.ac.uk/~ugbt794
> http://www.mopane.org
>
> Here's the code:
>
> switchline<-
> function(X,Y,Intmin,Intmax,Intgap,Slopemin,Slopemax,Slopegap)
>
>
>
> {
>
>
>
> # X = unimodal variable (e.g. filename$bodysize)
>
>
>
> # Y = bimodal variable (e.g. filename$hornsize)
>
>
>
> # Intmin = Minimum switchline intercept
>
>
>
> # Slopemin = Minimum switchline slope
>
> # Intmax = Maximum switchline intercept
>
> # Slopemax = Maximum switchline slope
>
> # Intgap = Interval between each intercept
>
> # Slopegap = Interval between each slope
>
>
>
>
>
>
>
> # This function written by Rob Knell 2003. It is an attempt to
> produce an
>
> # objective analysis for datasets of apparently dimorphic
> characters
> in which
>
> # there is overlap in both the X and Y variable. The
> routine divides
> the
>
> # individuals into one of two classes based on whether they are
> above or below
>
> # a line, and fits a linear model to the Y variable with the X
> variable as a
>
> # continuous explanatory variable and a single factor, "Morph",
> which is based
>
> # on whether they were above the line or below it, plus an
> interaction term.
>
> # The line is then adjusted, the individuals reclassified and the
> process
>
> # repeated. The output is of the form "Int" - intercept of
> the line
> dividing
>
> # one morph from another, "Slp" - the slope and "Rsqr",
> which gives
> the R-
>
> # squared value for the fitted model when the division
> into factors
> is based on
>
> # that particular line.
>
>
>
> # An error message probably means that one of your lines
> has missed
> the data
>
> # altogether - in other words, all of your data points are
> classified into a
>
> # single factor.
>
> #
>
>
>
>
>
>
>
> Lines.grid<-
> expand.grid(Intercept=seq(Intmin,Intmax,Intgap),Slope=seq(Slop
> emin,Slope
> max,Slopegap))
>
>
>
> bandwidth<-length(Lines.grid$Intercept)
>
>
>
>
>
> Lines<-matrix(nrow=bandwidth,ncol=3,data=NA)
>
> dimnames(Lines)<-list(seq(1,bandwidth),c("Int","Slp","Rsqr"))
>
>
>
> Lines[,1]<-Lines.grid$Intercept
>
> Lines[,2]<-Lines.grid$Slope
>
>
>
> for (i in 1:bandwidth)
>
>
>
> {
>
> temp.data<-matrix(nrow=length(X),ncol=4,data=NA)
>
> colnames(temp.data)<-c("X1","Y1","switchvalue","Morph")
>
> temp.data[,1]<-X
>
> temp.data[,2]<-Y
>
> for(j in 1:length(X))
>
> {
>
> temp.data[j,3]<-(Lines[i,1]+Lines[i,2]*temp.data[j,1])
>
> ifelse(temp.data[j,2]>=temp.data[j,3], temp.data[j,4]<-1,
> temp.data[j,4]<-2)
>
> }
>
>
>
> temp.data<-data.frame(temp.data)
>
> temp.data$Morph<-factor(temp.data$Morph)
>
> model<-lm(temp.data$Y1~temp.data$X1*temp.data$Morph)
>
> Lines[i,3]<-round(summary(model)$r.squared,6)
>
> }
>
>
>
> print(Lines)
>
>
>
> Lines<-data.frame(Lines)
>
> Max<-max(Lines$Rsqr)
>
> MaxI<-Lines$Int[Lines$Rsqr==Max]
>
> MaxS<-Lines$Slp[Lines$Rsqr==Max]
>
> Maxmat<-cbind(MaxI,MaxS)
>
>
>
> print(Maxmat)
>
> }
>
>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments,...{{dropped}}
More information about the R-help
mailing list