[R] ecdf() to nls() - how to transform data?

Sun Jul 17 10:12:12 CEST 2011

Thanks David and Peter!

I tried to improve my R-script to get closer to my goal. 
I guess I have to use nls(), because later I want to work with
Levenberg-Marquardt-Algorithm and when I got it right, LM-Algorithm uses
least squares as well, fitdistr() instead uses Maximum Likelihoods. Anyway,
I am wondering, how to create the appropriate data.frame for nls()? In my
opinion all data must be there:
I got the absolute numbers in my dataset from the internet.
I got the histogram/density function right, as you can see in the graph
number 3 after running the following example.
I got the cumulative density right as well - so how to find x and y for the
data-frame in line 116?
Is it necessary/easier to get this example worked with absolute numbers?
The two easy tutorials for nls() worked out as well, so that can't be that
hard I guess, but I am sitting here for two days and got no idea what to do
now!? Thanks in advance for a little more input guys.

#################################################
### R-Skript Example Fit                     ###
#################################################

cat( " ************************************************************* \n" )
cat( "  Script curvefit - from histogram to function \n" )
cat( " \n" )

# Preparation
setwd( "/home/joba/jobascripts/rproject/" )
rm( list=ls( all=TRUE ) )

# Organise X11, x Rows, y Graphs
par( mfrow=c( 2, 2 ) )

############### tutorial 1 ############################

# Build data.frame() 
x <- seq( 0.0, 0.30, length=10 )  					# seq( from, to, by )
y <- c( 0.04, 0.06, 0.12, 0.14, 0.15, 0.16, 0.18, 0.19, 0.20, 0.26 )	# vals
of target function
df <- data.frame( x=x, y=y )

#print( df )
plot( df, main="Tutorial 1" )

# Formula: y = x^a , look for a
res <- nls( y ~ I(x^exponent), data = df, start = list( exponent = 1 ),
trace = T )
print( round( summary(res)$coefficients[1], 3 ))
lines( x, predict( res, list( x = x ) ), lty = 1, col = "blue" )

# Formula: y = x^a + b, look for a and b
res2 <- nls( y ~ I( x^exponent + gamma ), data = df, start = list(
exponent=1, gamma=0 ), trace = TRUE )
print( round( summary(res2)$coefficients[1], 3 ))
lines( x, predict( res2, list( x = x ) ), lty = 1, col = "red" )

############### tutorial 2 ############################

len <- 24
x <- runif( len ) 		 # 24 random values between 0 and 1
y <- x^3 + rnorm( len, 0, 0.06 ) # compute y-values and add some noise
ds <- data.frame( x = x, y = y ) # build data.frame
#str( ds ) 

# Compute prediction
s <- seq( 0, 1, length = 100 ) # compute real-data, to get y-values
m <- nls( y ~ I(x^power), data = ds, start = list( power = 1 ), trace = T )
#print( class(m) )
print( summary(m) )

# plot fitted curve and known curve
power <- round( summary(m)$coefficients[1], 3 ) 	# get power from summary
power.se <- round( summary(m)$coefficients[2], 3 ) 	# get failure from
summary
plot( y ~ x, main = "Tutorial 2", sub="Blue: fit; green: known" )
lines( s, s^3, lty = 2, col = "green" ) 		# known real data
lines( s, predict( m, list( x = s )), lty = 1, col = "blue" ) # predicted
vals
text( 0, 0.5, paste( "y =x^ (", power, " +/- ", power.se, ")", sep=""),
pos=4 )

############### real data #############################
# --- Create Points ---

# Real data and density
hgd <- read.csv(
file="http://www.jochen-bauer.net/downloads/curve-fitting-tutorial/data01.txt",
sep=",", head=FALSE) 
#print( hgd$V1 )
print( summary( hgd$V1 ) )
denspoints <- density( hgd$V1 )

##### define variables for later use
msa <- "MSA-Name"
c1 <- "col1"
c2 <- "col2"
histtitle <- paste( msa, "Spalten:", c1, ",", c2 )

#################################################
# --- Plot ---

# Organise X11, x Rows, y Graphs
#par( mfrow=c( 1, 2 ) )

# Histogram
histo <- hist( hgd$V1, freq=FALSE, main=histtitle, xlab="U-Value",
ylab="Density") 

# Density-Line
lines( denspoints, col="GREEN", lwd=1 )

# Plot cumulatives
plot( ecdf( hgd$V1 ), col="GREEN", lwd=1, main="", xlab="U-Value",
ylab="Probability" ) # real data

#################################################
# --- Fit ---

# Matching Gumbel distribution as tip from expert

# Gumbel-Dist-Function, cumulative
#               ( - ( x - mue ) / beta )
#          ( -e )^
# F(x) = e^
# => exp( - exp( - ( x - mue ) / beta ) )  # Thanks to Peter Dalgaard :-)

# Starting guesses: mue = medianOfData, beta = sdevOfData
mueStart <- median( hgd$V1 )
betaStart <- sd( hgd$V1 )
print( paste( "mueStart: ", mueStart ))
print( paste( "betaStart: ", betaStart ))

# Build data.frame() 
# x = min of uvals to max of uvals
# y = probabilities, which add to 1, normalized histogramms could help ?

### XXXXXXXXXXX ######### XXXXXXXXXXXXX #############################
x <- seq( 0.0, 0.30, length=10 )  					# seq( from, to, by )
y <- c( 0.04, 0.06, 0.12, 0.14, 0.15, 0.16, 0.18, 0.19, 0.20, 0.26 )	# vals
of target function
df <- data.frame( x=x, y=y )
print( df )

# Formula: y = x^a , look for a
res <- nls( y ~ exp( - exp( - ( x - mue ) / beta )), data = df, start =
list( mue = mueStart, beta = betaStart ), trace = T )
print( round( summary(res)$coefficients[1], 3 ))
lines( x, predict( res, list( x = x ) ), lty = 1, col = "blue" )

cat( "\n Skript curvefit - Ende\n" )
cat( " --- \n" )

--
View this message in context: http://r.789695.n4.nabble.com/ecdf-to-nls-how-to-transform-data-tp3671754p3673055.html
Sent from the R help mailing list archive at Nabble.com.