[R] ecdf() to nls() - how to transform data?

Sun Jul 17 23:48:01 CEST 2011

On Jul 17, 2011, at 11:07 AM, David Winsemius wrote:

>
> On Jul 17, 2011, at 4:12 AM, Jochen1980 wrote:
>
>> Thanks David and Peter!
>>
>> I tried to improve my R-script to get closer to my goal.
>> I guess I have to use nls(), because later I want to work with
>> Levenberg-Marquardt-Algorithm and when I got it right, LM-Algorithm  
>> uses
>> least squares as well, fitdistr() instead uses Maximum Likelihoods.  
>> Anyway,
>> I am wondering, how to create the appropriate data.frame for nls()?  
>> In my
>> opinion all data must be there:
>> I got the absolute numbers in my dataset from the internet.
>> I got the histogram/density function right, as you can see in the  
>> graph
>> number 3 after running the following example.
>> I got the cumulative density right as well - so how to find x and y  
>> for the
>> data-frame in line 116?
>
> The trick I used (which has been criticised by my statistical  
> betters) was to recognize that the y values for the ecdf function  
> were simply (1:100)/100
>
> df <- dataframe(y=(1:100)/100, x=knots(ecdf)  # worked for me
> res <- nls( y ~ exp( - exp( - ( x - mue ) / beta ),  
> start=list(mue=2, beta=3), data=df)

The expression should  have been:

y ~ exp( - exp( - ( x - mue ) / beta ))

Using:

  gdta <- rgamma(100, 2)
  ecdfgr <- ecdf(gdta)
dfecdf <- data.frame(knots=knots(ecdfgr), Fn =(1:100)/100)

 > res <- nls( Fn ~ exp( - exp( - ( knots - mue ) / beta )),  
start=list(mue=2, beta=2), data=dfecdf)
 > res
Nonlinear regression model
   model:  Fn ~ exp(-exp(-(knots - mue)/beta))
    data:  dfecdf
   mue  beta
1.395 1.068
  residual sum-of-squares: 0.04979

>
> The concern statistically was that the nls function is minimizing  
> error around y which has no error.  Whether one can invert the  
> function and thereby avoid the concern about minimizing error around  
> a regular series ... I don't know.
>
> You could certainly try both, since the inverted function is so easy  
> to construct:
>
> res <- nls( x ~ beta(- log(- log(y))) - mue, start=list(mue=2,  
> beta=3), data=df)

I think I got the sign of  mue  wrong and I get an Inf error with the  
100th value for the dataframe but leaving it off gave me a comparable  
answer:

 > res <- nls( knots ~ - beta1*(- log(- log(Fn))) - mue,  
start=list(mue=2, beta1=2), data=dfecdf[1:99, ])

 > res <- nls( knots ~  beta1*(- log(- log(Fn))) + mue,  
start=list(mue=2, beta1=2), data=dfecdf[1:99, ])
 > res
Nonlinear regression model
   model:  knots ~ beta1 * (-log(-log(Fn))) + mue
    data:  dfecdf[1:99, ]
   mue beta1
1.455 1.004
  residual sum-of-squares: 3.041

Number of iterations to convergence: 1
Achieved convergence tolerance: 3.295e-09

>
>
> (None of this code was tested, although the first part is very  
> similar if not identical to code I tested yesterday. I was concerned  
> this was a homework problem and was less specific than I might have  
> been. I now see that you have at least made efforts, although I am  
> in a hurry to get to a sailing and have not gone through it.)
> -- 
> David.
>
>
>> Is it necessary/easier to get this example worked with absolute  
>> numbers?
>> The two easy tutorials for nls() worked out as well, so that can't  
>> be that
>> hard I guess, but I am sitting here for two days and got no idea  
>> what to do
>> now!? Thanks in advance for a little more input guys.
>>
>>
>> #################################################
>> ### R-Skript Example Fit                     ###
>> #################################################
>>
>> cat( "  
>> ************************************************************* \n" )
>> cat( "  Script curvefit - from histogram to function \n" )
>> cat( " \n" )
>>
>> # Preparation
>> setwd( "/home/joba/jobascripts/rproject/" )
>> rm( list=ls( all=TRUE ) )
>>
>> # Organise X11, x Rows, y Graphs
>> par( mfrow=c( 2, 2 ) )
>>
>> ############### tutorial 1 ############################
>>
>> # Build data.frame()
>> x <- seq( 0.0, 0.30, length=10 )  					# seq( from, to, by )
>> y <- c( 0.04, 0.06, 0.12, 0.14, 0.15, 0.16, 0.18, 0.19, 0.20,  
>> 0.26 )	# vals
>> of target function
>> df <- data.frame( x=x, y=y )
>>
>> #print( df )
>> plot( df, main="Tutorial 1" )
>>
>> # Formula: y = x^a , look for a
>> res <- nls( y ~ I(x^exponent), data = df, start = list( exponent =  
>> 1 ),
>> trace = T )
>> print( round( summary(res)$coefficients[1], 3 ))
>> lines( x, predict( res, list( x = x ) ), lty = 1, col = "blue" )
>>
>> # Formula: y = x^a + b, look for a and b
>> res2 <- nls( y ~ I( x^exponent + gamma ), data = df, start = list(
>> exponent=1, gamma=0 ), trace = TRUE )
>> print( round( summary(res2)$coefficients[1], 3 ))
>> lines( x, predict( res2, list( x = x ) ), lty = 1, col = "red" )
>>
>>
>> ############### tutorial 2 ############################
>>
>> len <- 24
>> x <- runif( len ) 		 # 24 random values between 0 and 1
>> y <- x^3 + rnorm( len, 0, 0.06 ) # compute y-values and add some  
>> noise
>> ds <- data.frame( x = x, y = y ) # build data.frame
>> #str( ds )
>>
>> # Compute prediction
>> s <- seq( 0, 1, length = 100 ) # compute real-data, to get y-values
>> m <- nls( y ~ I(x^power), data = ds, start = list( power = 1 ),  
>> trace = T )
>> #print( class(m) )
>> print( summary(m) )
>>
>> # plot fitted curve and known curve
>> power <- round( summary(m)$coefficients[1], 3 ) 	# get power from  
>> summary
>> power.se <- round( summary(m)$coefficients[2], 3 ) 	# get failure  
>> from
>> summary
>> plot( y ~ x, main = "Tutorial 2", sub="Blue: fit; green: known" )
>> lines( s, s^3, lty = 2, col = "green" ) 		# known real data
>> lines( s, predict( m, list( x = s )), lty = 1, col = "blue" ) #  
>> predicted
>> vals
>> text( 0, 0.5, paste( "y =x^ (", power, " +/- ", power.se, ")",  
>> sep=""),
>> pos=4 )
>>
>> ############### real data #############################
>> # --- Create Points ---
>>
>> # Real data and density
>> hgd <- read.csv(
>> file="http://www.jochen-bauer.net/downloads/curve-fitting-tutorial/data01.txt 
>> ",
>> sep=",", head=FALSE)
>> #print( hgd$V1 )
>> print( summary( hgd$V1 ) )
>> denspoints <- density( hgd$V1 )
>>
>> ##### define variables for later use
>> msa <- "MSA-Name"
>> c1 <- "col1"
>> c2 <- "col2"
>> histtitle <- paste( msa, "Spalten:", c1, ",", c2 )
>>
>> #################################################
>> # --- Plot ---
>>
>> # Organise X11, x Rows, y Graphs
>> #par( mfrow=c( 1, 2 ) )
>>
>> # Histogram
>> histo <- hist( hgd$V1, freq=FALSE, main=histtitle, xlab="U-Value",
>> ylab="Density")
>>
>> # Density-Line
>> lines( denspoints, col="GREEN", lwd=1 )
>>
>> # Plot cumulatives
>> plot( ecdf( hgd$V1 ), col="GREEN", lwd=1, main="", xlab="U-Value",
>> ylab="Probability" ) # real data
>>
>> #################################################
>> # --- Fit ---
>>
>> # Matching Gumbel distribution as tip from expert
>>
>> # Gumbel-Dist-Function, cumulative
>> #               ( - ( x - mue ) / beta )
>> #          ( -e )^
>> # F(x) = e^
>> # => exp( - exp( - ( x - mue ) / beta ) )  # Thanks to Peter  
>> Dalgaard :-)
>>
>> # Starting guesses: mue = medianOfData, beta = sdevOfData
>> mueStart <- median( hgd$V1 )
>> betaStart <- sd( hgd$V1 )
>> print( paste( "mueStart: ", mueStart ))
>> print( paste( "betaStart: ", betaStart ))
>>
>> # Build data.frame()
>> # x = min of uvals to max of uvals
>> # y = probabilities, which add to 1, normalized histogramms could  
>> help ?
>>
>> ### XXXXXXXXXXX ######### XXXXXXXXXXXXX #############################
>> x <- seq( 0.0, 0.30, length=10 )  					# seq( from, to, by )
>> y <- c( 0.04, 0.06, 0.12, 0.14, 0.15, 0.16, 0.18, 0.19, 0.20,  
>> 0.26 )	# vals
>> of target function
>> df <- data.frame( x=x, y=y )
>> print( df )
>>
>> # Formula: y = x^a , look for a
>> res <- nls( y ~ exp( - exp( - ( x - mue ) / beta )), data = df,  
>> start =
>> list( mue = mueStart, beta = betaStart ), trace = T )
>> print( round( summary(res)$coefficients[1], 3 ))
>> lines( x, predict( res, list( x = x ) ), lty = 1, col = "blue" )
>>
>> cat( "\n Skript curvefit - Ende\n" )
>> cat( " --- \n" )
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/ecdf-to-nls-how-to-transform-data-tp3671754p3673055.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT