[R] sample train and test data using dplyr

Ulrik Stervbo ulrik.stervbo at gmail.com
Fri Dec 9 07:42:56 CET 2016


df <- data.frame(x = 1:12, y = rnorm(12))

If you use sample:

RowIndex <- sample(1:nrow(df), 5)
TrainSet <- df[RowIndex, ]
TestSet <- df[-RowIndex, ]

Or with dplyr:

TrainSet <- sample_n(df, 5)
TestSet <- anti_join(TestSet, df)

HTH
Ulrik

On Fri, 9 Dec 2016, 06:56 Partha Sinha, <pnsinha68 at gmail.com> wrote:

> How to get two sets of non overlapping data?
> Regards
> Parth
>
> On 8 December 2016 at 23:23, Ulrik Stervbo <ulrik.stervbo at gmail.com>
> wrote:
>
> In addition to 'sample', and if you insist on dplyr, you can use
> 'sample_n'.
>
> Best,
> Ulrik
>
> On Thu, 8 Dec 2016 at 18:47 Bert Gunter <bgunter.4567 at gmail.com> wrote:
>
> Usually we expect posters to do their homework by reading necessary R
> documentation and relevant subject matter resources (e.g. on
> clustering) and making a serious attempt to solve the problem by
> offering their code to us along as part of  a reproducible example of
> how it failed. You have done none of these things, and so you may not
> receive a helpful reply -- or maybe some kind soul will offer one.
>
> I am not such a kind soul. However I will tell you that ?sample is
> probably relevant and that you should read and follow the posting
> guide at the foot of this email to post a coherent query, which, IMO,
> yours is not.
>
> Cheers,
> Bert
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Thu, Dec 8, 2016 at 8:57 AM, Partha Sinha <pnsinha68 at gmail.com> wrote:
> > I want to create two files train and test using dplyr (by random sampling
> > method). How to do the same same using lets say iris data.
> > Regards
> > Parth
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list