[R] repeating an analysis
Phil Spector
spector at stat.berkeley.edu
Wed Oct 13 02:30:48 CEST 2010
Andrew -
I think
answer = replicate(50,{fit1 <- rpart(CHAB~.,data=chabun, method="anova",
control=rpart.control(minsplit=10,
cp=0.01, xval=10));
x = printcp(fit1);
x[which.min(x[,'xerror']),'nsplit']})
will put the numbers you want into answer, but there was no reproducible
example to test it on. Unfortunately, I don't know of any way to
surpress the printing from printcp().
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector at stat.berkeley.edu
On Wed, 13 Oct 2010, Andrew Halford wrote:
> Hi All,
>
> I have to say upfront that I am a complete neophyte when it comes to
> programming. Nevertheless I enjoy the challenge of using R because of its
> incredible statistical resources.
>
> My problem is this .........I am running a regression tree analysis using
> "rpart" and I need to run the calculation repeatedly (say n=50 times) to
> obtain a distribution of results from which I will pick the median one to
> represent the most parsimonious tree size. Unfortunately rpart does not
> contain this ability so it will have to be coded for.
>
> Could anyone help me with this? I have provided the code (and relevant
> output) for the analysis I am running. I need to run it n=50 times and from
> each output pick the appropriate tree size and post it to a datafile where I
> can then look at the frequency distribution of tree sizes.
>
> Here is the code and output from a single run
>
>> fit1 <- rpart(CHAB~.,data=chabun, method="anova",
> control=rpart.control(minsplit=10, cp=0.01, xval=10))
>> printcp(fit1)
>
> Regression tree:
> rpart(formula = CHAB ~ ., data = chabun, method = "anova", control =
> rpart.control(minsplit = 10,
> cp = 0.01, xval = 10))
> Variables actually used in tree construction:
> [1] EXP LAT POC RUG
> Root node error: 35904/33 = 1088
> n= 33
> CP nsplit rel error xerror xstd
> 1 0.539806 0 1.00000 1.0337 0.41238
> 2 0.050516 1 0.46019 1.2149 0.38787
> 3 0.016788 2 0.40968 1.2719 0.41280
> 4 0.010221 3 0.39289 1.1852 0.38300
> 5 0.010000 4 0.38267 1.1740 0.38333
>
> Each time I re-run the model I will get a slightly different output. I want
> to extract the nsplit number corresponding to the lowest xerror for each run
> of the model (in this case it is for nsplit = 0) over 50 runs and then look
> at the distribution of nsplits after 50 runs.
>
> Any help appreciated.
>
>
> Andy
>
>
> --
> Andrew Halford
> Associate Researcher
> Marine Laboratory
> University of Guam
> Ph: +1 671 734 2948
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list