[R] Fit continuous distribution to truncated empirical values
Michele Mazzucco
michelemazzucco at gmail.com
Mon Nov 14 17:20:46 CET 2011
David,
here is the smallest dataset
# Bid Price Survival Censored
0.03 0.029 1 1
0.03 0.029 11 1
0.03 0.029 10 1
0.03 0.029 9 1
0.03 0.029 8 1
0.03 0.029 7 1
0.03 0.029 6 1
0.03 0.029 5 1
0.03 0.029 4 1
0.03 0.029 3 1
0.03 0.029 2 1
0.03 0.029 1 1
0.03 0.029 1 1
0.03 0.029 1 1
0.03 0.029 9 1
0.03 0.029 8 1
0.03 0.029 7 1
0.03 0.029 6 1
0.03 0.029 5 1
0.03 0.029 4 1
0.03 0.029 3 1
0.03 0.029 2 1
0.03 0.029 1 1
0.03 0.029 16 1
0.03 0.029 15 1
0.03 0.029 14 1
0.03 0.029 13 1
0.03 0.029 12 1
0.03 0.029 11 1
0.03 0.029 10 1
0.03 0.029 9 1
0.03 0.029 8 1
0.03 0.029 7 1
0.03 0.029 6 1
0.03 0.029 5 1
0.03 0.029 4 1
0.03 0.029 3 1
0.03 0.029 2 1
0.03 0.029 1 1
0.03 0.029 6 1
0.03 0.029 5 1
0.03 0.029 4 1
0.03 0.029 3 1
0.03 0.029 2 1
0.03 0.029 1 1
0.03 0.029 7 1
0.03 0.029 6 1
0.03 0.029 5 1
0.03 0.029 4 1
0.03 0.029 3 1
0.03 0.029 2 1
0.03 0.029 1 1
0.03 0.029 5 1
0.03 0.029 4 1
0.03 0.029 3 1
0.03 0.029 2 1
0.03 0.029 1 1
0.03 0.029 4 1
0.03 0.029 3 1
0.03 0.029 2 1
0.03 0.029 1 1
0.03 0.029 6 1
0.03 0.029 5 1
0.03 0.029 4 1
0.03 0.029 3 1
0.03 0.029 2 1
0.03 0.029 1 1
0.03 0.029 30 1
0.03 0.029 29 1
0.03 0.029 28 1
0.03 0.029 27 1
0.03 0.029 26 1
0.03 0.029 25 1
0.03 0.029 24 1
0.03 0.029 23 1
0.03 0.029 22 1
0.03 0.029 21 1
0.03 0.029 20 1
0.03 0.029 19 1
0.03 0.029 18 1
0.03 0.029 17 1
0.03 0.029 16 1
0.03 0.029 15 1
0.03 0.029 14 1
0.03 0.029 13 1
0.03 0.029 12 1
0.03 0.029 11 1
0.03 0.029 10 1
0.03 0.029 9 1
0.03 0.029 8 1
0.03 0.029 7 1
0.03 0.029 6 1
0.03 0.029 5 1
0.03 0.029 4 1
0.03 0.029 3 1
0.03 0.029 2 1
0.03 0.029 1 1
0.03 0.029 8 1
0.03 0.029 7 1
0.03 0.029 6 1
0.03 0.029 5 1
0.03 0.029 4 1
0.03 0.029 3 1
0.03 0.029 2 1
0.03 0.029 1 1
0.03 0.029 4 1
0.03 0.029 3 1
0.03 0.029 2 1
0.03 0.029 1 1
0.03 0.027 71 0
0.03 0.027 70 0
0.03 0.027 69 0
0.03 0.027 68 0
0.03 0.027 67 0
0.03 0.027 66 0
0.03 0.027 65 0
0.03 0.027 64 0
0.03 0.027 63 0
0.03 0.027 62 0
0.03 0.027 61 0
0.03 0.027 60 0
0.03 0.027 59 0
0.03 0.027 58 0
0.03 0.027 57 0
0.03 0.027 56 0
0.03 0.027 55 0
0.03 0.027 54 0
0.03 0.027 53 0
0.03 0.027 52 0
0.03 0.027 51 0
0.03 0.027 50 0
0.03 0.027 49 0
0.03 0.027 48 0
0.03 0.027 47 0
0.03 0.027 46 0
0.03 0.027 45 0
0.03 0.027 44 0
0.03 0.027 43 0
0.03 0.027 42 0
0.03 0.027 41 0
0.03 0.027 40 0
0.03 0.027 39 0
0.03 0.027 38 0
0.03 0.027 37 0
0.03 0.027 36 0
0.03 0.027 35 0
0.03 0.027 34 0
0.03 0.027 33 0
0.03 0.027 32 0
0.03 0.027 31 0
0.03 0.027 30 0
0.03 0.027 29 0
0.03 0.027 28 0
0.03 0.027 27 0
0.03 0.027 26 0
0.03 0.027 25 0
0.03 0.027 24 0
0.03 0.027 23 0
0.03 0.027 22 0
0.03 0.027 21 0
0.03 0.027 20 0
0.03 0.027 19 0
0.03 0.027 18 0
0.03 0.027 17 0
0.03 0.027 16 0
0.03 0.027 15 0
0.03 0.027 14 0
0.03 0.027 13 0
0.03 0.027 12 0
0.03 0.027 11 0
0.03 0.027 10 0
0.03 0.027 9 0
0.03 0.027 8 0
0.03 0.027 7 0
0.03 0.027 6 0
0.03 0.027 5 0
0.03 0.027 4 0
0.03 0.027 3 0
0.03 0.027 2 0
0.03 0.027 1 0
This is column # 3
1 11 10 9 8 7 6 5 4 3 2 1 1 1 9 8 7 6 5 4 3 2 1 16 15 14
13 12 11 10 9 8 7 6 5 4 3 2 1 6 5 4 3 2 1 7 6 5 4 3 2 1
5 4 3 2 1 4 3 2 1 6 5 4 3 2 1 30 29 28 27 26 25 24 23 22 21 20
19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2
1 4 3 2 1 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51
50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25
24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
This is the script I have used for fitting the dataset to a Weibull distribution
my_plot <- function() {
require(plotrix) # for plotCI
require(MASS) # for fitdistr
require(survival) # for survival analysis
require(fitdistrplus) # for fitdist
lifetimes=read.table("lifetime.txt", header=F, comment.char="#")
s = Surv(lifetimes$V3, lifetimes$V4)
mfit = survfit(s~V1, data=lifetimes, conf.type="plain")
cdf=ecdf(lifetimes$V3)
empirical = numeric(24)
T = seq(1,24)
for (i in T) empirical[i] = cdf(i)
fit = fitdist(lifetimes$V3, "weibull")
plot(T, 1- pweibull(T, fit$estimate[1], fit$estimate[2]), type="l", col="red", lwd=2, lty=1, xlim=c(1,24), xlab="Time [hours]", ylab="S(t)")
lines(T, 1 - empirical, col="blue", lwd=2, lty=2)
plotCI(T, mfit$surv[T], ui=mfit$upper[T], li=mfit$lower[T], col="green", lwd=1, lty=3, add=T)
legend("bottomleft", c("Weibull fit", "1 - Empirical CDF", "Empirical S(t)"), col=c("red", "blue", "green"), horiz=F, lty=1:3, lwd=2)
}
my_plot()
About your questions
1 - What do you mean?, I guess I don't know the answer -- that's why I have posted my question on this mailing list.
2 - As I wrote in my first email, I am interested only in the first portion of the survival times (24 hours). Hence, I'd prefer to have a "good" fit over the interval of interest rather than a "reasonable" fit over the whole data.
3 - I went though those tutorials, but couldn't find an answer. That's probably due to the fact that those tutorials are "general", as you pointed out, while my question is rather specific.
Cheers,
Michele
On Nov 14, 2011, at 5:59 PM, David Winsemius wrote:
>
> On Nov 14, 2011, at 6:11 AM, Michele Mazzucco wrote:
>
>> Hello David,
>>
>> thanks for your answer.
>> I have done as you told me, however the fit is very poor, much worse than that obtained from using the whole dataset (without upper bound).
>> Any idea?
>
> Counter questions in the absence of data:
>
> ??? Is there a reason from domain specific knowledge and theory to expect that the procedure you are attempting should be correct?
>
> ??? Why would you want to exclude data?
>
> ??? Why are you not looking for more general tutorials (such as the one by Ricci) rather than asking what is as yet an open-ended question on fitting methods that surely does not admit an answer easily provided on a mailing list?
>
> --
> David.
>
>
>>
>> Thanks,
>> Michele
>>
>> On Nov 4, 2011, at 8:56 PM, David Winsemius wrote:
>>
>>>
>>> On Nov 3, 2011, at 7:54 AM, Michele Mazzucco wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am trying to fit a distribution to some data about survival times.
>>>> I am interested only in a specific interval, e.g., while the data lies in the interval (0,...., 600), I want the best for the interval (0,..., 24).
>>>>
>>>> I have tried both fitdistr (MASS package) and fitdist (from the fitdistrplus package), but I could not get them working, e.g.
>>>>
>>>> fitdistr(left, "weibull", upper=24)
>>>> Error in optim(x = c(529L, 528L, 527L, 526L, 525L, 524L, 523L, 522L, 521L, :
>>>> L-BFGS-B needs finite values of 'fn'
>>>> In addition: Warning message:
>>>> In dweibull(x, shape, scale, log) : NaNs produced
>>>>
>>>> Am I doing something wrong?
>>>
>>> You didn't supply data to test, but shouldn't you supply a lower bound if you want to fit "weibull"? It is, after all, bounded at 0.
>>>
>>>> left <- c(529L, 528L, 527L, 526L, 525L, 524L, 523L, 522L, 521L, 50*runif(100))
>>>> fitdistr(left, "weibull", upper=24)
>>> Error in optim(x = c(529, 528, 527, 526, 525, 524, 523, 522, 521, 18.3964251773432, :
>>> L-BFGS-B needs finite values of 'fn'
>>> In addition: Warning message:
>>> In dweibull(x, shape, scale, log) : NaNs produced
>>>
>>>> fitdistr(left, "weibull", upper=24, lower=0.5)
>>> shape scale
>>> 0.58195013 24.00000000
>>> ( 0.04046087) ( 3.38621367)
>>>
>>>>
>>>>
>>>> Thanks,
>>>> Michele
>>>>
>>>>
>>>> p.s. I have seen similar posts, e.g., http://tolstoy.newcastle.edu.au/R/help/05/02/11558.html, but I am not sure whether I can apply the same approach here.
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> David Winsemius, MD
>>> Heritage Laboratories
>>> West Hartford, CT
>>>
>>
>
> David Winsemius, MD
> West Hartford, CT
>
More information about the R-help
mailing list