[R] Splitting a data column randomly into 3 groups

AbouEl-Makarim Aboueissa @boue|m@k@r|m1962 @end|ng |rom gm@||@com
Fri Sep 3 16:27:45 CEST 2021


Hi Avi: good morning

Again, many thanks to all of you. I appreciate all what you are doing. You
are good. I did it in Minitab. It cost me a little bit more time, but it is
okay.

It was a little bit confusing for me to do it in R. Because in *Step 1: *I
have to select a random sample of size n=204 (say) out of N=700 (say). Then
in Step 2: I have to allocate the 204 randomly selected obs. into three
groups of equal sample sizes.

Again, thank you very much, and sorry if I bothered you.


with many thanks
abou
______________________


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*



On Thu, Sep 2, 2021 at 10:42 PM Avi Gross via R-help <r-help using r-project.org>
wrote:

> Abou,
>
>
>
> I am not trying to be negative. Assuming you are a professor of
> Statistics, your request seems odd as what you are asking about is very
> routine in much of statistical work where you want to make a model or
> something using just part of your data and need to reserve some to check if
> you perhaps trained an algorithm too much for the original data used.
>
>
>
> A simple online search before asking questions here is appreciated. I did
> a quick search for something like “R split data into three parts” and see
> several applicable answers.
>
>
>
> There are people on this forum who actually get paid to do nontrivial
> tasks and do not mind help in spots but feel sort of used if expected to
> write a serious amount of code and perhaps then be asked to redo it with
> more bells and whistles added. A recent badly phrased request comes to mind
> where several of us provided and answer only to find out it was for a
> different scenario, …
>
>
>
> So let me continue with a serious answer. May we assume you KNOW how to
> read the data in to something like a data.frame? If so, and if you see no
> need or value in doing this the hard way, then your question could have
> been to ask if there is an R built-in function or perhaps a pacjkage
> already set to solve it quickly. Again, a simple online search can do
> wonders.  Here, for example is a package called caret and this page
> discusses spliutting data multiple ways:
>
>
>
> https://topepo.github.io/caret/data-splitting.html
>
>
>
> There are other such pages suggesting how to do it using base R.
>
>
>
> Here is one that gives an example on how to make  three unequal partitions:
>
>
>
> inds <- partition(iris$Sepal.Length, p = c(train = 0.6, valid = 0.2, test
> = 0.2))
>
>
>
>
>
> There is more to do below but in the above, you would use whatever names
> you want instead of train/valid/test and set all three to 0.33 and so on.
>
>
>
> I repeat, that what you want to do strikes some of us as a fairly routine
> thing to do and lots of people have written how they have done it and you
> can pick and choose, or redo it on your own. If what you have is a homework
> assignment, the appropriate thing is to have you learn to use some
> technique yourself and perhaps get minor help when it fails. But if you
> will be doing this regularly, use of some packages is highly valuable.
>
>
>
> Good Luck.
>
>
>
>
>
>
>
>
>
>
>
> From: AbouEl-Makarim Aboueissa <abouelmakarim1962 using gmail.com>
> Sent: Thursday, September 2, 2021 9:51 PM
> To: Avi Gross <avigross using verizon.net>
> Cc: R mailing list <r-help using r-project.org>
> Subject: Re: [R] Splitting a data column randomly into 3 groups
>
>
>
> Sorry, please forget about it. I believe that I am very serious when I
> posted my question.
>
>
>
> with thanks
>
> abou
>
>
> ______________________
>
> AbouEl-Makarim Aboueissa, PhD
>
>
>
> Professor, Statistics and Data Science
>
> Graduate Coordinator
>
> Department of Mathematics and Statistics
>
> University of Southern Maine
>
>
>
>
>
>
>
> On Thu, Sep 2, 2021 at 9:42 PM Avi Gross via R-help <r-help using r-project.org
> <mailto:r-help using r-project.org> > wrote:
>
> What is stopping you Abou?
>
> Some of us here start wondering if we have better things to do than
> homework for others. Help is supposed to be after they try and encounter
> issues that we may help with.
>
> So think about your problem. You supplied data in a file that is NOT in
> CSV format but is in Tab separated format.
>
> You need to get it in to your program and store it in something. It looks
> like you have 204 items so 1/3 of those would be exactly 68.
>
> So if your data is in an object like a vector or data.frame, you want to
> choose random number between 1 and 204. How do you do that? You need 1/3 of
> the length of the object items, in your case 68.
>
> Now extract the items with  those indices into say A1. Extract all the
> rest into a temporary item.
>
> Make another 68 random indices, with no overlap, and copy those items into
> A2 and the ones that do not have those into A3 and you are sort of done,
> other than some cleanup or whatever.
>
> There are many ways to do the above and I am sure packages too.
>
> But since you have made no visible effort, I personally am not going to
> pick anything in particular.
>
> Had you shown some text and code along the lines of the above and just
> wanted to know how to copy just the ones that were not selected, we could
> easily ...
>
>
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org <mailto:
> r-help-bounces using r-project.org> > On Behalf Of AbouEl-Makarim Aboueissa
> Sent: Thursday, September 2, 2021 9:30 PM
> To: R mailing list <r-help using r-project.org <mailto:r-help using r-project.org> >
> Subject: [R] Splitting a data column randomly into 3 groups
>
> Dear All:
>
> How to split a column data *randomly* into three groups. Please see the
> attached data. I need to split column #2 titled "Data"
>
> with many thanks
> abou
> ______________________
>
>
> *AbouEl-Makarim Aboueissa, PhD*
>
> *Professor, Statistics and Data Science* *Graduate Coordinator*
>
> *Department of Mathematics and Statistics* *University of Southern Maine*
>
> ______________________________________________
> R-help using r-project.org <mailto:R-help using r-project.org>  mailing list -- To
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list