[R] Fit continuous distribution to truncated empirical values

Mon Nov 14 18:17:49 CET 2011

Bert,

I think there is a misunderstanding here.
Some data is censored, but I want to fit the data with a distribution  
in the interval [0,24] only. Also, please note that I have other  
datasets having values larger than 1000.

Cheers,
Michele

On 14 Nov 2011, at 18:28, Bert Gunter wrote:

> A non-helpful reply on a "language" issue.
>
> "Truncated" data are quite different than "censored" data and  
> require different methodologies to analyze. You -- and many others  
> in their postings -- have appeared to confuse the two here.
>
> -- Bert
>
>
>
> On Mon, Nov 14, 2011 at 8:20 AM, Michele Mazzucco <michelemazzucco at gmail.com 
> > wrote:
> David,
>
> here is the smallest dataset
>
> # Bid   Price   Survival        Censored
> 0.03    0.029   1       1
> 0.03    0.029   11      1
> 0.03    0.029   10      1
> 0.03    0.029   9       1
> 0.03    0.029   8       1
> 0.03    0.029   7       1
> 0.03    0.029   6       1
> 0.03    0.029   5       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   1       1
> 0.03    0.029   1       1
> 0.03    0.029   9       1
> 0.03    0.029   8       1
> 0.03    0.029   7       1
> 0.03    0.029   6       1
> 0.03    0.029   5       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   16      1
> 0.03    0.029   15      1
> 0.03    0.029   14      1
> 0.03    0.029   13      1
> 0.03    0.029   12      1
> 0.03    0.029   11      1
> 0.03    0.029   10      1
> 0.03    0.029   9       1
> 0.03    0.029   8       1
> 0.03    0.029   7       1
> 0.03    0.029   6       1
> 0.03    0.029   5       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   6       1
> 0.03    0.029   5       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   7       1
> 0.03    0.029   6       1
> 0.03    0.029   5       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   5       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   6       1
> 0.03    0.029   5       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   30      1
> 0.03    0.029   29      1
> 0.03    0.029   28      1
> 0.03    0.029   27      1
> 0.03    0.029   26      1
> 0.03    0.029   25      1
> 0.03    0.029   24      1
> 0.03    0.029   23      1
> 0.03    0.029   22      1
> 0.03    0.029   21      1
> 0.03    0.029   20      1
> 0.03    0.029   19      1
> 0.03    0.029   18      1
> 0.03    0.029   17      1
> 0.03    0.029   16      1
> 0.03    0.029   15      1
> 0.03    0.029   14      1
> 0.03    0.029   13      1
> 0.03    0.029   12      1
> 0.03    0.029   11      1
> 0.03    0.029   10      1
> 0.03    0.029   9       1
> 0.03    0.029   8       1
> 0.03    0.029   7       1
> 0.03    0.029   6       1
> 0.03    0.029   5       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   8       1
> 0.03    0.029   7       1
> 0.03    0.029   6       1
> 0.03    0.029   5       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.029   4       1
> 0.03    0.029   3       1
> 0.03    0.029   2       1
> 0.03    0.029   1       1
> 0.03    0.027   71      0
> 0.03    0.027   70      0
> 0.03    0.027   69      0
> 0.03    0.027   68      0
> 0.03    0.027   67      0
> 0.03    0.027   66      0
> 0.03    0.027   65      0
> 0.03    0.027   64      0
> 0.03    0.027   63      0
> 0.03    0.027   62      0
> 0.03    0.027   61      0
> 0.03    0.027   60      0
> 0.03    0.027   59      0
> 0.03    0.027   58      0
> 0.03    0.027   57      0
> 0.03    0.027   56      0
> 0.03    0.027   55      0
> 0.03    0.027   54      0
> 0.03    0.027   53      0
> 0.03    0.027   52      0
> 0.03    0.027   51      0
> 0.03    0.027   50      0
> 0.03    0.027   49      0
> 0.03    0.027   48      0
> 0.03    0.027   47      0
> 0.03    0.027   46      0
> 0.03    0.027   45      0
> 0.03    0.027   44      0
> 0.03    0.027   43      0
> 0.03    0.027   42      0
> 0.03    0.027   41      0
> 0.03    0.027   40      0
> 0.03    0.027   39      0
> 0.03    0.027   38      0
> 0.03    0.027   37      0
> 0.03    0.027   36      0
> 0.03    0.027   35      0
> 0.03    0.027   34      0
> 0.03    0.027   33      0
> 0.03    0.027   32      0
> 0.03    0.027   31      0
> 0.03    0.027   30      0
> 0.03    0.027   29      0
> 0.03    0.027   28      0
> 0.03    0.027   27      0
> 0.03    0.027   26      0
> 0.03    0.027   25      0
> 0.03    0.027   24      0
> 0.03    0.027   23      0
> 0.03    0.027   22      0
> 0.03    0.027   21      0
> 0.03    0.027   20      0
> 0.03    0.027   19      0
> 0.03    0.027   18      0
> 0.03    0.027   17      0
> 0.03    0.027   16      0
> 0.03    0.027   15      0
> 0.03    0.027   14      0
> 0.03    0.027   13      0
> 0.03    0.027   12      0
> 0.03    0.027   11      0
> 0.03    0.027   10      0
> 0.03    0.027   9       0
> 0.03    0.027   8       0
> 0.03    0.027   7       0
> 0.03    0.027   6       0
> 0.03    0.027   5       0
> 0.03    0.027   4       0
> 0.03    0.027   3       0
> 0.03    0.027   2       0
> 0.03    0.027   1       0
>
>
> This is column # 3
> 1 11 10  9  8  7  6  5  4  3  2  1  1  1  9  8  7  6  5  4  3  2  1  
> 16 15 14
> 13 12 11 10  9  8  7  6  5  4  3  2  1  6  5  4  3  2  1  7  6  5   
> 4  3  2  1
> 5  4  3  2  1  4  3  2  1  6  5  4  3  2  1 30 29 28 27 26 25 24 23  
> 22 21 20
> 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  8  7  6   
> 5  4  3  2
> 1  4  3  2  1 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54  
> 53 52 51
> 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28  
> 27 26 25
> 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3   
> 2  1
>
>
> This is the script I have used for fitting the dataset to a Weibull  
> distribution
>
>
> my_plot <- function() {
>        require(plotrix) # for plotCI
>        require(MASS) # for fitdistr
>        require(survival) # for survival analysis
>        require(fitdistrplus) # for fitdist
>
>
>        lifetimes=read.table("lifetime.txt", header=F,  
> comment.char="#")
>        s = Surv(lifetimes$V3, lifetimes$V4)
>        mfit = survfit(s~V1, data=lifetimes, conf.type="plain")
>
>        cdf=ecdf(lifetimes$V3)
>        empirical = numeric(24)
>        T = seq(1,24)
>        for (i in T) empirical[i] = cdf(i)
>
>        fit = fitdist(lifetimes$V3, "weibull")
>        plot(T, 1- pweibull(T, fit$estimate[1], fit$estimate[2]),  
> type="l", col="red", lwd=2, lty=1, xlim=c(1,24), xlab="Time  
> [hours]", ylab="S(t)")
>        lines(T, 1 - empirical, col="blue", lwd=2, lty=2)
>        plotCI(T, mfit$surv[T], ui=mfit$upper[T], li=mfit$lower[T],  
> col="green", lwd=1, lty=3, add=T)
>        legend("bottomleft", c("Weibull fit", "1 - Empirical CDF",  
> "Empirical S(t)"), col=c("red", "blue", "green"), horiz=F, lty=1:3,  
> lwd=2)
>
> }
>
>
> my_plot()
>
>
>
>
> About your questions
> 1 - What do you mean?, I guess I don't know the answer -- that's why  
> I have posted my question on this mailing list.
> 2 - As I wrote in my first email, I am interested only in the first  
> portion of the survival times (24 hours). Hence, I'd prefer to have  
> a "good" fit over the interval of interest rather than a  
> "reasonable" fit over the whole data.
> 3 - I went though those tutorials, but couldn't find an answer.  
> That's probably due to the fact that those tutorials are "general",  
> as you pointed out, while my question is rather specific.
>
> Cheers,
> Michele
>
>
>
>
> On Nov 14, 2011, at 5:59 PM, David Winsemius wrote:
>
> >
> > On Nov 14, 2011, at 6:11 AM, Michele Mazzucco wrote:
> >
> >> Hello David,
> >>
> >> thanks for your answer.
> >> I have done as you told me, however the fit is very poor, much  
> worse than that obtained from using the whole dataset (without upper  
> bound).
> >> Any idea?
> >
> > Counter questions in the absence of data:
> >
> > ??? Is there a reason from domain specific knowledge and theory to  
> expect that the procedure you are attempting should be correct?
> >
> > ??? Why would you want to exclude data?
> >
> > ??? Why are you not looking for more general tutorials (such as  
> the one by Ricci) rather than asking what is as yet an open-ended  
> question on fitting methods that surely does not admit an answer  
> easily provided on a mailing list?
> >
> > --
> > David.
> >
> >
> >>
> >> Thanks,
> >> Michele
> >>
> >> On Nov 4, 2011, at 8:56 PM, David Winsemius wrote:
> >>
> >>>
> >>> On Nov 3, 2011, at 7:54 AM, Michele Mazzucco wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> I am trying to fit a distribution to some data about survival  
> times.
> >>>> I am interested only in a specific interval, e.g., while the  
> data lies in the interval (0,...., 600), I want the best for the  
> interval (0,..., 24).
> >>>>
> >>>> I have tried both fitdistr (MASS package) and fitdist (from the  
> fitdistrplus package), but I could not get them working, e.g.
> >>>>
> >>>> fitdistr(left, "weibull", upper=24)
> >>>> Error in optim(x = c(529L, 528L, 527L, 526L, 525L, 524L, 523L,  
> 522L, 521L,  :
> >>>> L-BFGS-B needs finite values of 'fn'
> >>>> In addition: Warning message:
> >>>> In dweibull(x, shape, scale, log) : NaNs produced
> >>>>
> >>>> Am I doing something wrong?
> >>>
> >>> You didn't supply data to test,  but shouldn't you supply a  
> lower bound if you want to fit "weibull"? It is, after all, bounded  
> at 0.
> >>>
> >>>> left <- c(529L, 528L, 527L, 526L, 525L, 524L, 523L, 522L, 521L,  
> 50*runif(100))
> >>>> fitdistr(left, "weibull", upper=24)
> >>> Error in optim(x = c(529, 528, 527, 526, 525, 524, 523, 522,  
> 521, 18.3964251773432,  :
> >>> L-BFGS-B needs finite values of 'fn'
> >>> In addition: Warning message:
> >>> In dweibull(x, shape, scale, log) : NaNs produced
> >>>
> >>>> fitdistr(left, "weibull", upper=24, lower=0.5)
> >>>    shape         scale
> >>> 0.58195013   24.00000000
> >>> ( 0.04046087) ( 3.38621367)
> >>>
> >>>>
> >>>>
> >>>> Thanks,
> >>>> Michele
> >>>>
> >>>>
> >>>> p.s. I have seen similar posts, e.g., http://tolstoy.newcastle.edu.au/R/help/05/02/11558.html 
> , but I am not sure whether I can apply the same approach here.
> >>>> ______________________________________________
> >>>> R-help at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible  
> code.
> >>>
> >>> David Winsemius, MD
> >>> Heritage Laboratories
> >>> West Hartford, CT
> >>>
> >>
> >
> > David Winsemius, MD
> > West Hartford, CT
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> -- 
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
>