[R] How to subset my data and at the same time keep the balance?

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Tue Nov 20 06:24:36 CET 2012


No.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Brian Feeny <bfeeny at me.com> wrote:

>
>Just curious, once you have a model that works well, does it make sense
>to then tune it against 100% of the dataset (with known outcomes)
>so you can apply it to data you wish to predict for or is that a bad
>approach?
>
>I have done like is explained in this thread many times, taken a
>sample, learned against it, and then tested on the remaining.  But this
>is using data
>for which we know the predicted variable and can compare to validate. 
>So after your done, should you re-tune with the entire training set?
>
>As for which method, I am using mostly SVM
>
>Brian
>
>On Nov 19, 2012, at 2:07 PM, Eddie Smith <eddieatr at gmail.com> wrote:
>
>> Thanks a lot! I got some ideas from all the replies and here is the
>final one.
>> 
>> newdata
>> 
>> select <- sample(nrow(newdata), nrow(newdata) * .7)
>> data70 <- newdata[select,]  # select
>> write.csv(data70, "data70.csv", row.names=FALSE)
>> 
>> data30 <- newdata[-select,]  # testing
>> write.csv(data30, "data30.csv", row.names=FALSE)
>> 
>> Cheers
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list