[R] repeating an analysis

Wed Oct 13 02:30:48 CEST 2010

Andrew -
    I think

answer = replicate(50,{fit1 <- rpart(CHAB~.,data=chabun, method="anova",
                                      control=rpart.control(minsplit=10,
                                              cp=0.01, xval=10));
                                      x = printcp(fit1);
                                      x[which.min(x[,'xerror']),'nsplit']})

will put the numbers you want into answer, but there was no reproducible
example to test it on.  Unfortunately, I don't know of any way to 
surpress the printing from printcp().

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu

On Wed, 13 Oct 2010, Andrew Halford wrote:

> Hi All,
>
> I have to say upfront that I am a complete neophyte when it comes to
> programming. Nevertheless I enjoy the challenge of using R because of its
> incredible statistical resources.
>
> My problem is this .........I am running a regression tree analysis using
> "rpart" and I need to run the calculation repeatedly (say n=50 times) to
> obtain a distribution of results from which I will pick the median one to
> represent the most parsimonious tree size. Unfortunately rpart does not
> contain this ability so it will have to be coded for.
>
> Could anyone help me with this? I have provided the code (and relevant
> output) for the analysis I am running. I need to run it n=50 times and from
> each output pick the appropriate tree size and post it to a datafile where I
> can then look at the frequency distribution of tree sizes.
>
> Here is the code and output from a single run
>
>> fit1 <- rpart(CHAB~.,data=chabun, method="anova",
> control=rpart.control(minsplit=10, cp=0.01, xval=10))
>> printcp(fit1)
>
> Regression tree:
> rpart(formula = CHAB ~ ., data = chabun, method = "anova", control =
> rpart.control(minsplit = 10,
>    cp = 0.01, xval = 10))
> Variables actually used in tree construction:
> [1] EXP LAT POC RUG
> Root node error: 35904/33 = 1088
> n= 33
>        CP nsplit rel error xerror    xstd
> 1 0.539806      0   1.00000 1.0337 0.41238
> 2 0.050516      1   0.46019 1.2149 0.38787
> 3 0.016788      2   0.40968 1.2719 0.41280
> 4 0.010221      3   0.39289 1.1852 0.38300
> 5 0.010000      4   0.38267 1.1740 0.38333
>
> Each time I re-run the model I will get a slightly different output. I want
> to extract the nsplit number corresponding to the lowest xerror for each run
> of the model (in this case it is for nsplit = 0) over 50 runs and then look
> at the distribution of nsplits after 50 runs.
>
> Any help appreciated.
>
>
> Andy
>
>
> -- 
> Andrew Halford
> Associate Researcher
> Marine Laboratory
> University of Guam
> Ph: +1 671 734 2948
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>