[R] strucchange Fstats() example

Thu May 31 14:11:05 CEST 2012

Thanks a lot for your answer Achim, this helped a lot. I have done a lot of reading, following your recommendations and I think I have a better idea of what I should use. My dataset contains binary data on survival of the calf depending on the mass of the mother. We know that the probability of survival of the calf should vary according to the mass of the mother: 3 groups of mass expected, lower survival of the calves for small and large females, best survival of calf for intermediate-sized females. I want to identify at which masses those changes in survival occur.

I think the code I need to use in order to test what I want is something of that type:
gmass<-gefp(Success ~ Mass, family=binomial, fit=glm, order.by=~Mass)

The first question I have is what is the difference between testing the gmass model shown up compared to the gmass2 model below?
gmass2<-gefp(Success ~ 1, family=binomial, fit=glm, order.by=~Mass)

The second question, which is related I think to the first one is whether it makes sense to plot the gmass model as aggregate=FALSE, knowing that we have a single parameter in the model (Mass), and this parameter is also the parameter we use as the order.by= parameter 
plot(gmass, functional=meanL2BB, aggregate=FALSE)
I think the whole point around questions 1 and 2  is that I don't understand the interpretation of the intercept in the gmass model???

Third question: how to choose the proper functional? I have seen that you discuss that in your CSDA(2006 & 2003) papers and, in the 2006 paper you say: "in situations where there is a shift in the parameters and then a second shift back, it is advantageous to aggregate over time using the range and ....". Which means, if I understand well that rangeBB would be adapted to the kind of test I want to perform. However, since I want to determine the timing of the peaks, I need my functional to produce a "time series plot", for example like meanL2BB does. Do you think I can use meanL2BB functional in my case or should I compute an "home-made functional" which would use the range of efp but with comp applied first and time after (is this possible???).

Fourth question: is it OK to only make a visual estimation of the "breakpoints"  from the peaks seen on the graph after plotting the efp or should I use the breakpoints() function to properly date the breakpoints??? I'm not sure this breakpoints() function can be applied to binary data?

Fifth question: I have noticed that the p-values I obtain after performing the sctest(gmass, functional=meanL2BB) for example are a bit different depending on if I introduce family=binomial as an argument in my gefp() call. Should I use this argument or is it used by default when you specify fit=glm??? 

Last question, you said in your previous message that I could look at the  maxstat_test() from package "coin" for an interesting nonparametric alternative but I think this package does not allow estimation of more than one breakpoint???

Thanks heaps if you can help again with those issues,
Best,
Geraldine

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Achim Zeileis
Sent: 30. mai 2012 08:23
To: R-help at r-project.org
Subject: Re: [R] strucchange Fstats() example

On Tue, 29 May 2012, Mabille, Geraldine wrote:

<snip>

> In the second example, the authors state the presence of "at least" 
> two breakpoints. When plotting the F-statistics using the following 
> code, we see indeed two peaks in the F-statistics, that coincides with 
> the dates given by the authors: c.a 1973 and 1983 but when trying to 
> add those breakpoints to the time series, only one is taken into 
> account

The breakpoints() method for "Fstats" objects can just extract a single breakpoint. The reason is that maximizing the F statistics is equivalent to minimizing the residual sum of squares of a model with a single breakpoint. If you want to estimate more than a single breakpoint, you need to minimize the corresponding segmented sums of squares. This can be done with the formula method of breakpoints(), see ?breakpoints.

More specificially: In your example with breakpoints(fs, breaks = 2), the breaks argument is simply ignored. The method just does not have a breaks argument and it goes through ...

> We see that even though the F-statistics seem to show the existence of 
> 2 breakpoints, only one is detected by the breakpoints() function. 
> Does anyone know how this is possible? I'm totally new to strucchange 
> so it might well be something obvious I'm missing here!

Please have a closer look at the package's documentation and the corresponding papers. See citation("strucchange") for the most important references and the corresponding manual pages for more details. For the breakpoints issue you should probably start reading the CSDA paper.

> OTHER SIDE QUESTION: can strucchange be used if the y variable is binary???

Testing for breakpoints can be done with the function gefp(). See its manual pages for references and details. The manual page just has a Poisson GLM example but the corresponding papers (in Stat Neerl and CSDA) also have binary response examples.

If you have a binary response and just want to test whether the proportion of successes changes across "time" (or some other variable of interest), then
maxstat_test() from package "coin" might be an interesting nonparametric alternative.

hth,
Z

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.