[R] Long-tail model in R ... anyone?

Dirk Eddelbuettel edd at debian.org
Wed Jul 4 21:15:49 CEST 2007


I think you simply had your nls() syntax wrong.  Works here:


## first a neat trick to read the data from embedded text
> fmdata <- read.csv(textConnection("
+ rank,cum_value
10,     17396510
32,     31194809
96,     53447300
420,    100379331
1187,   152238166
24234,  432238757
91242,  581332371
294180, 650880870
1242185,665227287"))
> 


## then compute cumulative share
> fmdata[,"cumshare"] <- fmdata[,"cum_value"] / fmdata[nrow(fmdata),"cum_value"]
> 


## then check the data, just in case
> summary(fmdata)
      rank           cum_value            cumshare      
 Min.   :     10   Min.   : 17396510   Min.   :0.02615  
 1st Qu.:     96   1st Qu.: 53447300   1st Qu.:0.08034  
 Median :   1187   Median :152238166   Median :0.22885  
 Mean   : 183732   Mean   :298259489   Mean   :0.44836  
 3rd Qu.:  91242   3rd Qu.:581332371   3rd Qu.:0.87389  
 Max.   :1242185   Max.   :665227287   Max.   :1.00000  
> 

## finally estimate the model, using only the first seven rows of data
## using the parametric form from the paper and some wild guesses as
## starting values:
> fit <- nls(cumshare ~ Beta / ((N50 / rank)^Alpha + 1), data=fmdata[1:7,], start=list(Alpha=1, Beta=1, N50=1e4))
> summary(fit)

Formula: cumshare ~ Beta/((N50/rank)^Alpha + 1)

Parameters:
       Estimate Std. Error t value Pr(>|t|)    
Alpha 4.829e-01  5.374e-03   89.86 9.20e-08 ***
Beta  1.429e+00  2.745e-02   52.07 8.14e-07 ***
N50   3.560e+04  3.045e+03   11.69 0.000306 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 0.002193 on 4 degrees of freedom

Number of iterations to convergence: 8 
Achieved convergence tolerance: 1.297e-06 

> 

which is reasonably close to the quoted 
	N50 = 30714, α = 0.49, and β = 1.38.

You can probably play a little with the nls options to see what effect this
has. 

That said, seven observations for three parameters in non-linear model may be
a little hazardous.  One indication is that the estimated parameters values
are not too stable once you add the eights and nineth row of data.

Dirk

-- 
Hell, there are no rules here - we're trying to accomplish something. 
                                                  -- Thomas A. Edison



More information about the R-help mailing list