[R] repeating an analysis

Wed Oct 13 02:32:23 CEST 2010

I think you want something like this:

optimal.nSplit = rep(NA, 50) # This will hold the result
for (run in 1:50)
{
  fit1 = rpart(...)
  cpTable = fit1$cptable
  bestRow = which.min(cpTable[, "xerror"]);
  optimal.nSplit[run] = cpTable[bestRow, "nsplit"]
}

In any case, look at
?rpart
?printcp
?rpart.object

Peter

On Tue, Oct 12, 2010 at 4:50 PM, Andrew Halford
<andrew.halford at gmail.com> wrote:
> Hi All,
>
> I have to say upfront that I am a complete neophyte when it comes to
> programming. Nevertheless I enjoy the challenge of using R because of its
> incredible statistical resources.
>
> My problem is this .........I am running a regression tree analysis using
> "rpart" and I need to run the calculation repeatedly (say n=50 times) to
> obtain a distribution of results from which I will pick the median one to
> represent the most parsimonious tree size. Unfortunately rpart does not
> contain this ability so it will have to be coded for.
>
> Could anyone help me with this? I have provided the code (and relevant
> output) for the analysis I am running. I need to run it n=50 times and from
> each output pick the appropriate tree size and post it to a datafile where I
> can then look at the frequency distribution of tree sizes.
>
> Here is the code and output from a single run
>
>> fit1 <- rpart(CHAB~.,data=chabun, method="anova",
> control=rpart.control(minsplit=10, cp=0.01, xval=10))
>> printcp(fit1)
>
> Regression tree:
> rpart(formula = CHAB ~ ., data = chabun, method = "anova", control =
> rpart.control(minsplit = 10,
>    cp = 0.01, xval = 10))
> Variables actually used in tree construction:
> [1] EXP LAT POC RUG
> Root node error: 35904/33 = 1088
> n= 33
>        CP nsplit rel error xerror    xstd
> 1 0.539806      0   1.00000 1.0337 0.41238
> 2 0.050516      1   0.46019 1.2149 0.38787
> 3 0.016788      2   0.40968 1.2719 0.41280
> 4 0.010221      3   0.39289 1.1852 0.38300
> 5 0.010000      4   0.38267 1.1740 0.38333
>
> Each time I re-run the model I will get a slightly different output. I want
> to extract the nsplit number corresponding to the lowest xerror for each run
> of the model (in this case it is for nsplit = 0) over 50 runs and then look
> at the distribution of nsplits after 50 runs.
>
> Any help appreciated.
>
>
> Andy
>
>
> --
> Andrew Halford
> Associate Researcher
> Marine Laboratory
> University of Guam
> Ph: +1 671 734 2948
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>