[R] How to subset my data and at the same time keep the balance?
Jeff Newmiller
jdnewmil at dcn.davis.CA.us
Tue Nov 20 06:24:36 CET 2012
No.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
Brian Feeny <bfeeny at me.com> wrote:
>
>Just curious, once you have a model that works well, does it make sense
>to then tune it against 100% of the dataset (with known outcomes)
>so you can apply it to data you wish to predict for or is that a bad
>approach?
>
>I have done like is explained in this thread many times, taken a
>sample, learned against it, and then tested on the remaining. But this
>is using data
>for which we know the predicted variable and can compare to validate.
>So after your done, should you re-tune with the entire training set?
>
>As for which method, I am using mostly SVM
>
>Brian
>
>On Nov 19, 2012, at 2:07 PM, Eddie Smith <eddieatr at gmail.com> wrote:
>
>> Thanks a lot! I got some ideas from all the replies and here is the
>final one.
>>
>> newdata
>>
>> select <- sample(nrow(newdata), nrow(newdata) * .7)
>> data70 <- newdata[select,] # select
>> write.csv(data70, "data70.csv", row.names=FALSE)
>>
>> data30 <- newdata[-select,] # testing
>> write.csv(data30, "data30.csv", row.names=FALSE)
>>
>> Cheers
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list