[R] Survival analysis
daniel at umd.edu
Thu Feb 18 01:48:51 CET 2010
This implies that you have indeed a time series. You cannot run a survival model on a single unit of obvservation (i.e., one population) unless you can do the things (or similar things) that David suggested to create a larger dataset by disaggregating information. However, your initial approach may be feasible even though you cannot technically determine extinction. The reasons for that you will not be able to determine extinction are (at least) twofold. 1st The probability that abundance of a species is reported may approach zero but it will never attain zero. 2nd You operate on anecdotal reports of abundance and not on actual counts of the species.
What you will be very safe to say is that it will be unlikely that the species will be reported abundant in the next year, for example, or for the foreseeable future (less certain). You will never be able to determine its extinction because the species could recover in the unforeseeable future unless it is extinct already. What you could also do is take a count of the population at point t (provided you have one). Then, you could use the count number at point t and multiply it with the ratio of the predicted proportions today over the predicted proportion. This may give you an estimate of the count number of individuals today. If this count number is smaller than 1.5 (i.e., you expect that less than two individuals still exist), you could conclude that the species is extinct or will go extinct very soon.
However, there is another danger in this. Abundance reports may well be relative, and the question is relative to what. That is, whether the species is reported abundant may depend more on the recent context than on what was considered abundant 150 years ago. If the measurement of abundance underlies varying standards over time, your analyses may or may not be salvageable.
The main question about the covariates is whether their values are random, i.e., that they just happened to be observed in these time periods but that their distributions were not different in the observed than in the unobserved time periods. E.g., they could have been observed because there was a bush fire one year, and then they followed up with the observations over several years. Certainly, the fire would have affected the ecosystem in unusual ways. So the other periods were not missing at random. If you have reason to believe that the values in the period in which they were observed differed from the values in other periods that were not observed, then it is not "safe" to use them (certainly not without further scrutiny). However, the missing variable coding I suggested earlier may relieve some of these concerns as it would capture unobserved heterogeneity between observed and unobserved time periods.
cuncta stricte discussurus
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of FishR
Sent: Wednesday, February 17, 2010 5:33 PM
To: r-help at r-project.org
Subject: Re: [R] Survival analysis
We are looking the extinction of a species of freshwater fish. The logistic
regression was derived by scoring the anecdotal descriptions of the species'
former population size (1 for a positive description of the population e.g.
abundant, and 0 for a negative description e.g. scarce) and plotting this
against time. Therefore it’s the population size relative to t=0. The
anecdotal evidence in not regular and therefore why I used a derived measure
of the population.
We then have the predictor variables temperature, oxygen and river
modification for some of the 1800-2000 time period. Unfortunately the data
is collected in bursts e.g. for the oxygen 1923-1938 and the 1954-1972, so
the missing data will not be random.
View this message in context: http://n4.nabble.com/Survival-analysis-tp1559155p1559435.html
Sent from the R help mailing list archive at Nabble.com.
R-help at r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help