[R] Fit continuous distribution to truncated empirical values
Michele Mazzucco
michelemazzucco at gmail.com
Mon Nov 14 18:17:49 CET 2011
Bert,
I think there is a misunderstanding here.
Some data is censored, but I want to fit the data with a distribution
in the interval [0,24] only. Also, please note that I have other
datasets having values larger than 1000.
Cheers,
Michele
On 14 Nov 2011, at 18:28, Bert Gunter wrote:
> A non-helpful reply on a "language" issue.
>
> "Truncated" data are quite different than "censored" data and
> require different methodologies to analyze. You -- and many others
> in their postings -- have appeared to confuse the two here.
>
> -- Bert
>
>
>
> On Mon, Nov 14, 2011 at 8:20 AM, Michele Mazzucco <michelemazzucco at gmail.com
> > wrote:
> David,
>
> here is the smallest dataset
>
> # Bid Price Survival Censored
> 0.03 0.029 1 1
> 0.03 0.029 11 1
> 0.03 0.029 10 1
> 0.03 0.029 9 1
> 0.03 0.029 8 1
> 0.03 0.029 7 1
> 0.03 0.029 6 1
> 0.03 0.029 5 1
> 0.03 0.029 4 1
> 0.03 0.029 3 1
> 0.03 0.029 2 1
> 0.03 0.029 1 1
> 0.03 0.029 1 1
> 0.03 0.029 1 1
> 0.03 0.029 9 1
> 0.03 0.029 8 1
> 0.03 0.029 7 1
> 0.03 0.029 6 1
> 0.03 0.029 5 1
> 0.03 0.029 4 1
> 0.03 0.029 3 1
> 0.03 0.029 2 1
> 0.03 0.029 1 1
> 0.03 0.029 16 1
> 0.03 0.029 15 1
> 0.03 0.029 14 1
> 0.03 0.029 13 1
> 0.03 0.029 12 1
> 0.03 0.029 11 1
> 0.03 0.029 10 1
> 0.03 0.029 9 1
> 0.03 0.029 8 1
> 0.03 0.029 7 1
> 0.03 0.029 6 1
> 0.03 0.029 5 1
> 0.03 0.029 4 1
> 0.03 0.029 3 1
> 0.03 0.029 2 1
> 0.03 0.029 1 1
> 0.03 0.029 6 1
> 0.03 0.029 5 1
> 0.03 0.029 4 1
> 0.03 0.029 3 1
> 0.03 0.029 2 1
> 0.03 0.029 1 1
> 0.03 0.029 7 1
> 0.03 0.029 6 1
> 0.03 0.029 5 1
> 0.03 0.029 4 1
> 0.03 0.029 3 1
> 0.03 0.029 2 1
> 0.03 0.029 1 1
> 0.03 0.029 5 1
> 0.03 0.029 4 1
> 0.03 0.029 3 1
> 0.03 0.029 2 1
> 0.03 0.029 1 1
> 0.03 0.029 4 1
> 0.03 0.029 3 1
> 0.03 0.029 2 1
> 0.03 0.029 1 1
> 0.03 0.029 6 1
> 0.03 0.029 5 1
> 0.03 0.029 4 1
> 0.03 0.029 3 1
> 0.03 0.029 2 1
> 0.03 0.029 1 1
> 0.03 0.029 30 1
> 0.03 0.029 29 1
> 0.03 0.029 28 1
> 0.03 0.029 27 1
> 0.03 0.029 26 1
> 0.03 0.029 25 1
> 0.03 0.029 24 1
> 0.03 0.029 23 1
> 0.03 0.029 22 1
> 0.03 0.029 21 1
> 0.03 0.029 20 1
> 0.03 0.029 19 1
> 0.03 0.029 18 1
> 0.03 0.029 17 1
> 0.03 0.029 16 1
> 0.03 0.029 15 1
> 0.03 0.029 14 1
> 0.03 0.029 13 1
> 0.03 0.029 12 1
> 0.03 0.029 11 1
> 0.03 0.029 10 1
> 0.03 0.029 9 1
> 0.03 0.029 8 1
> 0.03 0.029 7 1
> 0.03 0.029 6 1
> 0.03 0.029 5 1
> 0.03 0.029 4 1
> 0.03 0.029 3 1
> 0.03 0.029 2 1
> 0.03 0.029 1 1
> 0.03 0.029 8 1
> 0.03 0.029 7 1
> 0.03 0.029 6 1
> 0.03 0.029 5 1
> 0.03 0.029 4 1
> 0.03 0.029 3 1
> 0.03 0.029 2 1
> 0.03 0.029 1 1
> 0.03 0.029 4 1
> 0.03 0.029 3 1
> 0.03 0.029 2 1
> 0.03 0.029 1 1
> 0.03 0.027 71 0
> 0.03 0.027 70 0
> 0.03 0.027 69 0
> 0.03 0.027 68 0
> 0.03 0.027 67 0
> 0.03 0.027 66 0
> 0.03 0.027 65 0
> 0.03 0.027 64 0
> 0.03 0.027 63 0
> 0.03 0.027 62 0
> 0.03 0.027 61 0
> 0.03 0.027 60 0
> 0.03 0.027 59 0
> 0.03 0.027 58 0
> 0.03 0.027 57 0
> 0.03 0.027 56 0
> 0.03 0.027 55 0
> 0.03 0.027 54 0
> 0.03 0.027 53 0
> 0.03 0.027 52 0
> 0.03 0.027 51 0
> 0.03 0.027 50 0
> 0.03 0.027 49 0
> 0.03 0.027 48 0
> 0.03 0.027 47 0
> 0.03 0.027 46 0
> 0.03 0.027 45 0
> 0.03 0.027 44 0
> 0.03 0.027 43 0
> 0.03 0.027 42 0
> 0.03 0.027 41 0
> 0.03 0.027 40 0
> 0.03 0.027 39 0
> 0.03 0.027 38 0
> 0.03 0.027 37 0
> 0.03 0.027 36 0
> 0.03 0.027 35 0
> 0.03 0.027 34 0
> 0.03 0.027 33 0
> 0.03 0.027 32 0
> 0.03 0.027 31 0
> 0.03 0.027 30 0
> 0.03 0.027 29 0
> 0.03 0.027 28 0
> 0.03 0.027 27 0
> 0.03 0.027 26 0
> 0.03 0.027 25 0
> 0.03 0.027 24 0
> 0.03 0.027 23 0
> 0.03 0.027 22 0
> 0.03 0.027 21 0
> 0.03 0.027 20 0
> 0.03 0.027 19 0
> 0.03 0.027 18 0
> 0.03 0.027 17 0
> 0.03 0.027 16 0
> 0.03 0.027 15 0
> 0.03 0.027 14 0
> 0.03 0.027 13 0
> 0.03 0.027 12 0
> 0.03 0.027 11 0
> 0.03 0.027 10 0
> 0.03 0.027 9 0
> 0.03 0.027 8 0
> 0.03 0.027 7 0
> 0.03 0.027 6 0
> 0.03 0.027 5 0
> 0.03 0.027 4 0
> 0.03 0.027 3 0
> 0.03 0.027 2 0
> 0.03 0.027 1 0
>
>
> This is column # 3
> 1 11 10 9 8 7 6 5 4 3 2 1 1 1 9 8 7 6 5 4 3 2 1
> 16 15 14
> 13 12 11 10 9 8 7 6 5 4 3 2 1 6 5 4 3 2 1 7 6 5
> 4 3 2 1
> 5 4 3 2 1 4 3 2 1 6 5 4 3 2 1 30 29 28 27 26 25 24 23
> 22 21 20
> 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 8 7 6
> 5 4 3 2
> 1 4 3 2 1 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54
> 53 52 51
> 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28
> 27 26 25
> 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3
> 2 1
>
>
> This is the script I have used for fitting the dataset to a Weibull
> distribution
>
>
> my_plot <- function() {
> require(plotrix) # for plotCI
> require(MASS) # for fitdistr
> require(survival) # for survival analysis
> require(fitdistrplus) # for fitdist
>
>
> lifetimes=read.table("lifetime.txt", header=F,
> comment.char="#")
> s = Surv(lifetimes$V3, lifetimes$V4)
> mfit = survfit(s~V1, data=lifetimes, conf.type="plain")
>
> cdf=ecdf(lifetimes$V3)
> empirical = numeric(24)
> T = seq(1,24)
> for (i in T) empirical[i] = cdf(i)
>
> fit = fitdist(lifetimes$V3, "weibull")
> plot(T, 1- pweibull(T, fit$estimate[1], fit$estimate[2]),
> type="l", col="red", lwd=2, lty=1, xlim=c(1,24), xlab="Time
> [hours]", ylab="S(t)")
> lines(T, 1 - empirical, col="blue", lwd=2, lty=2)
> plotCI(T, mfit$surv[T], ui=mfit$upper[T], li=mfit$lower[T],
> col="green", lwd=1, lty=3, add=T)
> legend("bottomleft", c("Weibull fit", "1 - Empirical CDF",
> "Empirical S(t)"), col=c("red", "blue", "green"), horiz=F, lty=1:3,
> lwd=2)
>
> }
>
>
> my_plot()
>
>
>
>
> About your questions
> 1 - What do you mean?, I guess I don't know the answer -- that's why
> I have posted my question on this mailing list.
> 2 - As I wrote in my first email, I am interested only in the first
> portion of the survival times (24 hours). Hence, I'd prefer to have
> a "good" fit over the interval of interest rather than a
> "reasonable" fit over the whole data.
> 3 - I went though those tutorials, but couldn't find an answer.
> That's probably due to the fact that those tutorials are "general",
> as you pointed out, while my question is rather specific.
>
> Cheers,
> Michele
>
>
>
>
> On Nov 14, 2011, at 5:59 PM, David Winsemius wrote:
>
> >
> > On Nov 14, 2011, at 6:11 AM, Michele Mazzucco wrote:
> >
> >> Hello David,
> >>
> >> thanks for your answer.
> >> I have done as you told me, however the fit is very poor, much
> worse than that obtained from using the whole dataset (without upper
> bound).
> >> Any idea?
> >
> > Counter questions in the absence of data:
> >
> > ??? Is there a reason from domain specific knowledge and theory to
> expect that the procedure you are attempting should be correct?
> >
> > ??? Why would you want to exclude data?
> >
> > ??? Why are you not looking for more general tutorials (such as
> the one by Ricci) rather than asking what is as yet an open-ended
> question on fitting methods that surely does not admit an answer
> easily provided on a mailing list?
> >
> > --
> > David.
> >
> >
> >>
> >> Thanks,
> >> Michele
> >>
> >> On Nov 4, 2011, at 8:56 PM, David Winsemius wrote:
> >>
> >>>
> >>> On Nov 3, 2011, at 7:54 AM, Michele Mazzucco wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> I am trying to fit a distribution to some data about survival
> times.
> >>>> I am interested only in a specific interval, e.g., while the
> data lies in the interval (0,...., 600), I want the best for the
> interval (0,..., 24).
> >>>>
> >>>> I have tried both fitdistr (MASS package) and fitdist (from the
> fitdistrplus package), but I could not get them working, e.g.
> >>>>
> >>>> fitdistr(left, "weibull", upper=24)
> >>>> Error in optim(x = c(529L, 528L, 527L, 526L, 525L, 524L, 523L,
> 522L, 521L, :
> >>>> L-BFGS-B needs finite values of 'fn'
> >>>> In addition: Warning message:
> >>>> In dweibull(x, shape, scale, log) : NaNs produced
> >>>>
> >>>> Am I doing something wrong?
> >>>
> >>> You didn't supply data to test, but shouldn't you supply a
> lower bound if you want to fit "weibull"? It is, after all, bounded
> at 0.
> >>>
> >>>> left <- c(529L, 528L, 527L, 526L, 525L, 524L, 523L, 522L, 521L,
> 50*runif(100))
> >>>> fitdistr(left, "weibull", upper=24)
> >>> Error in optim(x = c(529, 528, 527, 526, 525, 524, 523, 522,
> 521, 18.3964251773432, :
> >>> L-BFGS-B needs finite values of 'fn'
> >>> In addition: Warning message:
> >>> In dweibull(x, shape, scale, log) : NaNs produced
> >>>
> >>>> fitdistr(left, "weibull", upper=24, lower=0.5)
> >>> shape scale
> >>> 0.58195013 24.00000000
> >>> ( 0.04046087) ( 3.38621367)
> >>>
> >>>>
> >>>>
> >>>> Thanks,
> >>>> Michele
> >>>>
> >>>>
> >>>> p.s. I have seen similar posts, e.g., http://tolstoy.newcastle.edu.au/R/help/05/02/11558.html
> , but I am not sure whether I can apply the same approach here.
> >>>> ______________________________________________
> >>>> R-help at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible
> code.
> >>>
> >>> David Winsemius, MD
> >>> Heritage Laboratories
> >>> West Hartford, CT
> >>>
> >>
> >
> > David Winsemius, MD
> > West Hartford, CT
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
>
More information about the R-help
mailing list