[R] ecdf() to nls() - how to transform data?

Sun Jul 17 17:07:39 CEST 2011

On Jul 17, 2011, at 4:12 AM, Jochen1980 wrote:

> Thanks David and Peter!
>
> I tried to improve my R-script to get closer to my goal.
> I guess I have to use nls(), because later I want to work with
> Levenberg-Marquardt-Algorithm and when I got it right, LM-Algorithm  
> uses
> least squares as well, fitdistr() instead uses Maximum Likelihoods.  
> Anyway,
> I am wondering, how to create the appropriate data.frame for nls()?  
> In my
> opinion all data must be there:
> I got the absolute numbers in my dataset from the internet.
> I got the histogram/density function right, as you can see in the  
> graph
> number 3 after running the following example.
> I got the cumulative density right as well - so how to find x and y  
> for the
> data-frame in line 116?

The trick I used (which has been criticised by my statistical betters)  
was to recognize that the y values for the ecdf function were simply  
(1:100)/100

df <- dataframe(y=(1:100)/100, x=knots(ecdf)  # worked for me
res <- nls( y ~ exp( - exp( - ( x - mue ) / beta ), start=list(mue=2,  
beta=3), data=df)

The concern statistically was that the nls function is minimizing  
error around y which has no error.  Whether one can invert the  
function and thereby avoid the concern about minimizing error around a  
regular series ... I don't know.

You could certainly try both, since the inverted function is so easy  
to construct:

res <- nls( x ~ beta(- log(- log(y))) - mue, start=list(mue=2,  
beta=3), data=df)

(None of this code was tested, although the first part is very similar  
if not identical to code I tested yesterday. I was concerned this was  
a homework problem and was less specific than I might have been. I now  
see that you have at least made efforts, although I am in a hurry to  
get to a sailing and have not gone through it.)
-- 
David.

> Is it necessary/easier to get this example worked with absolute  
> numbers?
> The two easy tutorials for nls() worked out as well, so that can't  
> be that
> hard I guess, but I am sitting here for two days and got no idea  
> what to do
> now!? Thanks in advance for a little more input guys.
>
>
> #################################################
> ### R-Skript Example Fit                     ###
> #################################################
>
> cat( " *************************************************************  
> \n" )
> cat( "  Script curvefit - from histogram to function \n" )
> cat( " \n" )
>
> # Preparation
> setwd( "/home/joba/jobascripts/rproject/" )
> rm( list=ls( all=TRUE ) )
>
> # Organise X11, x Rows, y Graphs
> par( mfrow=c( 2, 2 ) )
>
> ############### tutorial 1 ############################
>
> # Build data.frame()
> x <- seq( 0.0, 0.30, length=10 )  					# seq( from, to, by )
> y <- c( 0.04, 0.06, 0.12, 0.14, 0.15, 0.16, 0.18, 0.19, 0.20, 0.26 )	 
> # vals
> of target function
> df <- data.frame( x=x, y=y )
>
> #print( df )
> plot( df, main="Tutorial 1" )
>
> # Formula: y = x^a , look for a
> res <- nls( y ~ I(x^exponent), data = df, start = list( exponent =  
> 1 ),
> trace = T )
> print( round( summary(res)$coefficients[1], 3 ))
> lines( x, predict( res, list( x = x ) ), lty = 1, col = "blue" )
>
> # Formula: y = x^a + b, look for a and b
> res2 <- nls( y ~ I( x^exponent + gamma ), data = df, start = list(
> exponent=1, gamma=0 ), trace = TRUE )
> print( round( summary(res2)$coefficients[1], 3 ))
> lines( x, predict( res2, list( x = x ) ), lty = 1, col = "red" )
>
>
> ############### tutorial 2 ############################
>
> len <- 24
> x <- runif( len ) 		 # 24 random values between 0 and 1
> y <- x^3 + rnorm( len, 0, 0.06 ) # compute y-values and add some noise
> ds <- data.frame( x = x, y = y ) # build data.frame
> #str( ds )
>
> # Compute prediction
> s <- seq( 0, 1, length = 100 ) # compute real-data, to get y-values
> m <- nls( y ~ I(x^power), data = ds, start = list( power = 1 ),  
> trace = T )
> #print( class(m) )
> print( summary(m) )
>
> # plot fitted curve and known curve
> power <- round( summary(m)$coefficients[1], 3 ) 	# get power from  
> summary
> power.se <- round( summary(m)$coefficients[2], 3 ) 	# get failure from
> summary
> plot( y ~ x, main = "Tutorial 2", sub="Blue: fit; green: known" )
> lines( s, s^3, lty = 2, col = "green" ) 		# known real data
> lines( s, predict( m, list( x = s )), lty = 1, col = "blue" ) #  
> predicted
> vals
> text( 0, 0.5, paste( "y =x^ (", power, " +/- ", power.se, ")",  
> sep=""),
> pos=4 )
>
> ############### real data #############################
> # --- Create Points ---
>
> # Real data and density
> hgd <- read.csv(
> file="http://www.jochen-bauer.net/downloads/curve-fitting-tutorial/data01.txt 
> ",
> sep=",", head=FALSE)
> #print( hgd$V1 )
> print( summary( hgd$V1 ) )
> denspoints <- density( hgd$V1 )
>
> ##### define variables for later use
> msa <- "MSA-Name"
> c1 <- "col1"
> c2 <- "col2"
> histtitle <- paste( msa, "Spalten:", c1, ",", c2 )
>
> #################################################
> # --- Plot ---
>
> # Organise X11, x Rows, y Graphs
> #par( mfrow=c( 1, 2 ) )
>
> # Histogram
> histo <- hist( hgd$V1, freq=FALSE, main=histtitle, xlab="U-Value",
> ylab="Density")
>
> # Density-Line
> lines( denspoints, col="GREEN", lwd=1 )
>
> # Plot cumulatives
> plot( ecdf( hgd$V1 ), col="GREEN", lwd=1, main="", xlab="U-Value",
> ylab="Probability" ) # real data
>
> #################################################
> # --- Fit ---
>
> # Matching Gumbel distribution as tip from expert
>
> # Gumbel-Dist-Function, cumulative
> #               ( - ( x - mue ) / beta )
> #          ( -e )^
> # F(x) = e^
> # => exp( - exp( - ( x - mue ) / beta ) )  # Thanks to Peter  
> Dalgaard :-)
>
> # Starting guesses: mue = medianOfData, beta = sdevOfData
> mueStart <- median( hgd$V1 )
> betaStart <- sd( hgd$V1 )
> print( paste( "mueStart: ", mueStart ))
> print( paste( "betaStart: ", betaStart ))
>
> # Build data.frame()
> # x = min of uvals to max of uvals
> # y = probabilities, which add to 1, normalized histogramms could  
> help ?
>
> ### XXXXXXXXXXX ######### XXXXXXXXXXXXX #############################
> x <- seq( 0.0, 0.30, length=10 )  					# seq( from, to, by )
> y <- c( 0.04, 0.06, 0.12, 0.14, 0.15, 0.16, 0.18, 0.19, 0.20, 0.26 )	 
> # vals
> of target function
> df <- data.frame( x=x, y=y )
> print( df )
>
> # Formula: y = x^a , look for a
> res <- nls( y ~ exp( - exp( - ( x - mue ) / beta )), data = df,  
> start =
> list( mue = mueStart, beta = betaStart ), trace = T )
> print( round( summary(res)$coefficients[1], 3 ))
> lines( x, predict( res, list( x = x ) ), lty = 1, col = "blue" )
>
> cat( "\n Skript curvefit - Ende\n" )
> cat( " --- \n" )
>
> --
> View this message in context: http://r.789695.n4.nabble.com/ecdf-to-nls-how-to-transform-data-tp3671754p3673055.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT