[R] Fit continuous distribution to truncated empirical values

Michele Mazzucco michelemazzucco at gmail.com
Mon Nov 14 17:20:46 CET 2011


David,

here is the smallest dataset

# Bid	Price	Survival	Censored
0.03	0.029	1	1
0.03	0.029	11	1
0.03	0.029	10	1
0.03	0.029	9	1
0.03	0.029	8	1
0.03	0.029	7	1
0.03	0.029	6	1
0.03	0.029	5	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	1	1
0.03	0.029	1	1
0.03	0.029	9	1
0.03	0.029	8	1
0.03	0.029	7	1
0.03	0.029	6	1
0.03	0.029	5	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	16	1
0.03	0.029	15	1
0.03	0.029	14	1
0.03	0.029	13	1
0.03	0.029	12	1
0.03	0.029	11	1
0.03	0.029	10	1
0.03	0.029	9	1
0.03	0.029	8	1
0.03	0.029	7	1
0.03	0.029	6	1
0.03	0.029	5	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	6	1
0.03	0.029	5	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	7	1
0.03	0.029	6	1
0.03	0.029	5	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	5	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	6	1
0.03	0.029	5	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	30	1
0.03	0.029	29	1
0.03	0.029	28	1
0.03	0.029	27	1
0.03	0.029	26	1
0.03	0.029	25	1
0.03	0.029	24	1
0.03	0.029	23	1
0.03	0.029	22	1
0.03	0.029	21	1
0.03	0.029	20	1
0.03	0.029	19	1
0.03	0.029	18	1
0.03	0.029	17	1
0.03	0.029	16	1
0.03	0.029	15	1
0.03	0.029	14	1
0.03	0.029	13	1
0.03	0.029	12	1
0.03	0.029	11	1
0.03	0.029	10	1
0.03	0.029	9	1
0.03	0.029	8	1
0.03	0.029	7	1
0.03	0.029	6	1
0.03	0.029	5	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	8	1
0.03	0.029	7	1
0.03	0.029	6	1
0.03	0.029	5	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.029	4	1
0.03	0.029	3	1
0.03	0.029	2	1
0.03	0.029	1	1
0.03	0.027	71	0
0.03	0.027	70	0
0.03	0.027	69	0
0.03	0.027	68	0
0.03	0.027	67	0
0.03	0.027	66	0
0.03	0.027	65	0
0.03	0.027	64	0
0.03	0.027	63	0
0.03	0.027	62	0
0.03	0.027	61	0
0.03	0.027	60	0
0.03	0.027	59	0
0.03	0.027	58	0
0.03	0.027	57	0
0.03	0.027	56	0
0.03	0.027	55	0
0.03	0.027	54	0
0.03	0.027	53	0
0.03	0.027	52	0
0.03	0.027	51	0
0.03	0.027	50	0
0.03	0.027	49	0
0.03	0.027	48	0
0.03	0.027	47	0
0.03	0.027	46	0
0.03	0.027	45	0
0.03	0.027	44	0
0.03	0.027	43	0
0.03	0.027	42	0
0.03	0.027	41	0
0.03	0.027	40	0
0.03	0.027	39	0
0.03	0.027	38	0
0.03	0.027	37	0
0.03	0.027	36	0
0.03	0.027	35	0
0.03	0.027	34	0
0.03	0.027	33	0
0.03	0.027	32	0
0.03	0.027	31	0
0.03	0.027	30	0
0.03	0.027	29	0
0.03	0.027	28	0
0.03	0.027	27	0
0.03	0.027	26	0
0.03	0.027	25	0
0.03	0.027	24	0
0.03	0.027	23	0
0.03	0.027	22	0
0.03	0.027	21	0
0.03	0.027	20	0
0.03	0.027	19	0
0.03	0.027	18	0
0.03	0.027	17	0
0.03	0.027	16	0
0.03	0.027	15	0
0.03	0.027	14	0
0.03	0.027	13	0
0.03	0.027	12	0
0.03	0.027	11	0
0.03	0.027	10	0
0.03	0.027	9	0
0.03	0.027	8	0
0.03	0.027	7	0
0.03	0.027	6	0
0.03	0.027	5	0
0.03	0.027	4	0
0.03	0.027	3	0
0.03	0.027	2	0
0.03	0.027	1	0


This is column # 3
1 11 10  9  8  7  6  5  4  3  2  1  1  1  9  8  7  6  5  4  3  2  1 16 15 14 
13 12 11 10  9  8  7  6  5  4  3  2  1  6  5  4  3  2  1  7  6  5  4  3  2  1
5  4  3  2  1  4  3  2  1  6  5  4  3  2  1 30 29 28 27 26 25 24 23 22 21 20
19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  8  7  6  5  4  3  2
1  4  3  2  1 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51
50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25
24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1


This is the script I have used for fitting the dataset to a Weibull distribution


my_plot <- function() {
	require(plotrix) # for plotCI
	require(MASS) # for fitdistr
	require(survival) # for survival analysis
	require(fitdistrplus) # for fitdist


	lifetimes=read.table("lifetime.txt", header=F, comment.char="#")
	s = Surv(lifetimes$V3, lifetimes$V4)
	mfit = survfit(s~V1, data=lifetimes, conf.type="plain")
	
	cdf=ecdf(lifetimes$V3)
	empirical = numeric(24)
	T = seq(1,24)
	for (i in T) empirical[i] = cdf(i)

	fit = fitdist(lifetimes$V3, "weibull")
	plot(T, 1- pweibull(T, fit$estimate[1], fit$estimate[2]), type="l", col="red", lwd=2, lty=1, xlim=c(1,24), xlab="Time [hours]", ylab="S(t)")
	lines(T, 1 - empirical, col="blue", lwd=2, lty=2)
	plotCI(T, mfit$surv[T], ui=mfit$upper[T], li=mfit$lower[T], col="green", lwd=1, lty=3, add=T)
	legend("bottomleft", c("Weibull fit", "1 - Empirical CDF", "Empirical S(t)"), col=c("red", "blue", "green"), horiz=F, lty=1:3, lwd=2)

}


my_plot()




About your questions
1 - What do you mean?, I guess I don't know the answer -- that's why I have posted my question on this mailing list.
2 - As I wrote in my first email, I am interested only in the first portion of the survival times (24 hours). Hence, I'd prefer to have a "good" fit over the interval of interest rather than a "reasonable" fit over the whole data.
3 - I went though those tutorials, but couldn't find an answer. That's probably due to the fact that those tutorials are "general", as you pointed out, while my question is rather specific.

Cheers,
Michele




On Nov 14, 2011, at 5:59 PM, David Winsemius wrote:

> 
> On Nov 14, 2011, at 6:11 AM, Michele Mazzucco wrote:
> 
>> Hello David,
>> 
>> thanks for your answer.
>> I have done as you told me, however the fit is very poor, much worse than that obtained from using the whole dataset (without upper bound).
>> Any idea?
> 
> Counter questions in the absence of data:
> 
> ??? Is there a reason from domain specific knowledge and theory to expect that the procedure you are attempting should be correct?
> 
> ??? Why would you want to exclude data?
> 
> ??? Why are you not looking for more general tutorials (such as the one by Ricci) rather than asking what is as yet an open-ended question on fitting methods that surely does not admit an answer easily provided on a mailing list?
> 
> -- 
> David.
> 
> 
>> 
>> Thanks,
>> Michele
>> 
>> On Nov 4, 2011, at 8:56 PM, David Winsemius wrote:
>> 
>>> 
>>> On Nov 3, 2011, at 7:54 AM, Michele Mazzucco wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I am trying to fit a distribution to some data about survival times.
>>>> I am interested only in a specific interval, e.g., while the data lies in the interval (0,...., 600), I want the best for the interval (0,..., 24).
>>>> 
>>>> I have tried both fitdistr (MASS package) and fitdist (from the fitdistrplus package), but I could not get them working, e.g.
>>>> 
>>>> fitdistr(left, "weibull", upper=24)
>>>> Error in optim(x = c(529L, 528L, 527L, 526L, 525L, 524L, 523L, 522L, 521L,  :
>>>> L-BFGS-B needs finite values of 'fn'
>>>> In addition: Warning message:
>>>> In dweibull(x, shape, scale, log) : NaNs produced
>>>> 
>>>> Am I doing something wrong?
>>> 
>>> You didn't supply data to test,  but shouldn't you supply a lower bound if you want to fit "weibull"? It is, after all, bounded at 0.
>>> 
>>>> left <- c(529L, 528L, 527L, 526L, 525L, 524L, 523L, 522L, 521L, 50*runif(100))
>>>> fitdistr(left, "weibull", upper=24)
>>> Error in optim(x = c(529, 528, 527, 526, 525, 524, 523, 522, 521, 18.3964251773432,  :
>>> L-BFGS-B needs finite values of 'fn'
>>> In addition: Warning message:
>>> In dweibull(x, shape, scale, log) : NaNs produced
>>> 
>>>> fitdistr(left, "weibull", upper=24, lower=0.5)
>>>    shape         scale
>>> 0.58195013   24.00000000
>>> ( 0.04046087) ( 3.38621367)
>>> 
>>>> 
>>>> 
>>>> Thanks,
>>>> Michele
>>>> 
>>>> 
>>>> p.s. I have seen similar posts, e.g., http://tolstoy.newcastle.edu.au/R/help/05/02/11558.html, but I am not sure whether I can apply the same approach here.
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>>> David Winsemius, MD
>>> Heritage Laboratories
>>> West Hartford, CT
>>> 
>> 
> 
> David Winsemius, MD
> West Hartford, CT
> 



More information about the R-help mailing list