[R] Extracting random rows from a dataset

David Winsemius dwinsemius at comcast.net
Sun Jan 18 20:37:19 CET 2009


 > read.table(textConnection(gsub("\\(|\\)", "", var) ))  #from prior  
posting
   V1 V2
1 p1 10
2 p1  3
3 p1  4
4 p2 20
5 p2 30
6 p2 40
7 p3  4
8 p3  1
9 p1  2

 > ridxs <- sample(1:nrow(df),floor(0.7*nrow(df)) )  # the 70% sample  
row IDs

 > df[ridxs,]
   V1 V2
5 p2 30
6 p2 40
2 p1  3
7 p3  4
4 p2 20
8 p3  1
 >
 >
 > df[-ridxs,]
   V1 V2
1 p1 10
3 p1  4
9 p1  2

The terms to pay particular attention to in the introductory material  
are row indexing, dataframe, and negative indexing of dataframes.



On Jan 18, 2009, at 12:35 PM, S.Putoto wrote:

>
> Hello dear R Users,
>
> I am working on a dataset of 928 Enterprises, of which are observed 12
> different characters. I need to randomly sample, without repetition,  
> 70% of
> the entreprises, to create a testing set, and let the other 30% of the
> enterprises be a validating set (holdout validation, I think that  
> is). How
> do I do that? Of course all the characters of each row must remain  
> together.
> Also, I am not very familiar with the R-Base language (it is the  
> first time
> I use it) so if You could also explain to me what every function and
> argument means, it would be great help to then reiterate the  
> procedure.

Really! Don't you that is a bit much? There are many tutorials  
available on line. The terms to pay particular attention to in the  
introductory material are indexing, dataframe, and negative indexing  
of dataframes.

--
David Winsemius

>
>
> Thank You very much,
>
> Sebastiano
> -- 
> View this message in context: http://www.nabble.com/Extracting-random-rows-from-a-dataset-tp21530539p21530539.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list