[R] How to randomly extract a number of rows in a data frame

Marc Schwartz marc_schwartz at me.com
Fri Aug 1 21:08:30 CEST 2014


On Aug 1, 2014, at 1:58 PM, Stephen HK Wong <honkit at stanford.edu> wrote:

> Dear ALL,
> 
> I have a dataframe contains 4 columns and several 10 millions of rows like below! I want to extract out "randomly" say 1 millions of rows, can you tell me how to do that in R using base packages? Many Thanks!!!!
> 
> Col_1	Col_2	Col_3	Col_4
> chr1	3000215	3000250	-
> chr1	3000909	3000944	+
> chr1	3001025	3001060	+
> chr1	3001547	3001582	+
> chr1	3002254	3002289	+
> chr1	3002324	3002359	-
> chr1	3002833	3002868	-
> chr1	3004565	3004600	-
> chr1	3004945	3004980	+
> chr1	3004974	3005009	-
> chr1	3005115	3005150	+
> chr1	3005124	3005159	+
> chr1	3005240	3005275	-
> chr1	3005558	3005593	-
> chr1	3005890	3005925	+
> chr1	3005929	3005964	+
> chr1	3005913	3005948	-
> chr1	3005913	3005948	-
> 
> Stephen HK Wong


If your data frame is called 'DF':

  DF.Rand <- DF[sample(nrow(DF), 1000000), ]

See ?sample which will generate a random sample from a uniform distribution.

In the above, nrow(DF) returns the number of rows in DF and defines the sample space of 1:nrow(DF), from which 1000000 random integer values will be selected and used as indices to return the rows.

Using the built in 'iris' dataset, select 20 random rows from the 150 total:

> iris[sample(nrow(iris), 20), ]
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
122          5.6         2.8          4.9         2.0  virginica
79           6.0         2.9          4.5         1.5 versicolor
109          6.7         2.5          5.8         1.8  virginica
106          7.6         3.0          6.6         2.1  virginica
49           5.3         3.7          1.5         0.2     setosa
125          6.7         3.3          5.7         2.1  virginica
1            5.1         3.5          1.4         0.2     setosa
68           5.8         2.7          4.1         1.0 versicolor
84           6.0         2.7          5.1         1.6 versicolor
110          7.2         3.6          6.1         2.5  virginica
113          6.8         3.0          5.5         2.1  virginica
64           6.1         2.9          4.7         1.4 versicolor
102          5.8         2.7          5.1         1.9  virginica
71           5.9         3.2          4.8         1.8 versicolor
69           6.2         2.2          4.5         1.5 versicolor
65           5.6         2.9          3.6         1.3 versicolor
74           6.1         2.8          4.7         1.2 versicolor
99           5.1         2.5          3.0         1.1 versicolor
135          6.1         2.6          5.6         1.4  virginica
41           5.0         3.5          1.3         0.3     setosa



Regards,

Marc Schwartz
 


More information about the R-help mailing list