# [R] Splitting a data column randomly into 3 groups

Fri Sep 3 04:42:18 CEST 2021

```Abou,

I am not trying to be negative. Assuming you are a professor of Statistics, your request seems odd as what you are asking about is very routine in much of statistical work where you want to make a model or something using just part of your data and need to reserve some to check if you perhaps trained an algorithm too much for the original data used.

A simple online search before asking questions here is appreciated. I did a quick search for something like “R split data into three parts” and see several applicable answers.

There are people on this forum who actually get paid to do nontrivial tasks and do not mind help in spots but feel sort of used if expected to write a serious amount of code and perhaps then be asked to redo it with more bells and whistles added. A recent badly phrased request comes to mind where several of us provided and answer only to find out it was for a different scenario, …

So let me continue with a serious answer. May we assume you KNOW how to read the data in to something like a data.frame? If so, and if you see no need or value in doing this the hard way, then your question could have been to ask if there is an R built-in function or perhaps a pacjkage already set to solve it quickly. Again, a simple online search can do wonders.  Here, for example is a package called caret and this page discusses spliutting data multiple ways:

https://topepo.github.io/caret/data-splitting.html

There are other such pages suggesting how to do it using base R.

Here is one that gives an example on how to make  three unequal partitions:

inds <- partition(iris\$Sepal.Length, p = c(train = 0.6, valid = 0.2, test = 0.2))

There is more to do below but in the above, you would use whatever names you want instead of train/valid/test and set all three to 0.33 and so on.

I repeat, that what you want to do strikes some of us as a fairly routine thing to do and lots of people have written how they have done it and you can pick and choose, or redo it on your own. If what you have is a homework assignment, the appropriate thing is to have you learn to use some technique yourself and perhaps get minor help when it fails. But if you will be doing this regularly, use of some packages is highly valuable.

Good Luck.

Sorry, please forget about it. I believe that I am very serious when I posted my question.

What is stopping you Abou?

Some of us here start wondering if we have better things to do than homework for others. Help is supposed to be after they try and encounter issues that we may help with.

So think about your problem. You supplied data in a file that is NOT in CSV format but is in Tab separated format.

You need to get it in to your program and store it in something. It looks like you have 204 items so 1/3 of those would be exactly 68.

So if your data is in an object like a vector or data.frame, you want to choose random number between 1 and 204. How do you do that? You need 1/3 of the length of the object items, in your case 68.

Now extract the items with  those indices into say A1. Extract all the rest into a temporary item.

Make another 68 random indices, with no overlap, and copy those items into A2 and the ones that do not have those into A3 and you are sort of done, other than some cleanup or whatever.

There are many ways to do the above and I am sure packages too.

But since you have made no visible effort, I personally am not going to pick anything in particular.

Had you shown some text and code along the lines of the above and just wanted to know how to copy just the ones that were not selected, we could easily ...

Dear All:

How to split a column data *randomly* into three groups. Please see the attached data. I need to split column #2 titled "Data"

