[R] Split data frame into 250-row chunks

Wed Jun 10 21:23:57 CEST 2015

> On Jun 10, 2015, at 2:21 PM, Marc Schwartz <marc_schwartz at me.com> wrote:
> 
> 
>> On Jun 10, 2015, at 7:39 AM, Liz Hare <doggene at earthlink.net> wrote:
>> 
>> Hi R-Experts,
>> 
>> I have a data.frame like this:
>> 
>>> head(map)
>> chr snp   poscm   posbp    dist
>> 1   1  M1 2.99043 3249189      NA
>> 2   1  M2 3.06457 3273096 0.07414
>> 3   1  M3 3.17018 3307151 0.10561
>> 4   1  M4 3.20892 3319643 0.03874
>> 5   1  M5 3.28120 3342947 0.07228
>> 6   1  M6 3.29624 3347798 0.01504
>> 
>> I need to split this into chunks of 250 rows (there will usually be a last chunk with < 250 rows).
>> 
>> If I only had to extract one 250-line chunk, it would be easy:
>> 
>> map1 <- map[1:250, ]
>> 
>> or using subset().
>> 
>> I tried to make it a loop iterating through num and using beg and nd for starting and ending indices, but I couldn’t figure out how to reference all the variables I needed in this:
>> 
>>> chunks
>>   beg   nd let num
>> 1     1  250   a   1
>> 2   251  500   b   2
>> 3   501  750   c   3
>> 4   751 1000   d   4
>> 5  1001 1250   e   5
>> 6  1251 1500   f   6
>> 7  1501 1750   g   7
>> 8  1751 2000   h   8
>> 9  2001 2250   i   9
>> 10 2251 2500   j  10
>> …
>> 
>> Remembering that loops are not always the best answer in R, I looked at other options like split, following this example but not being able to adapt it from a vector to a data.frame version
>> http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r <http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r> (Yes, I’ve reviewed the language documentation). I checked out ddply and data.table, but couldn’t find a way to use them with index positions instead of column values.
>> 
>> Thanks,
>> Liz
> 
> 
> Hi,
> 
>  map.split <- split(x, (as.numeric(rownames(map)) - 1) %/% 250)

Shoot, typo in the above, it should be ‘map’, not ‘x’:

   map.split <- split(map, (as.numeric(rownames(map)) - 1) %/% 250)

Marc

> 
> That will create a list of data frames comprised of subsets of ‘map’, each of which will have 250 records except, of course, for the last one.
> 
> Essentially, you are creating a grouping variable based upon the numeric row names modulo the length of the chunks that you want. For example, using the built-in ‘iris’ dataset, which has 150 rows:
> 
>> (as.numeric(rownames(iris)) - 1) %/% 50
>  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> [34] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> [67] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> [100] 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> [133] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> 
> iris.split <- split(iris, (as.numeric(rownames(iris)) - 1) %/% 50)
> 
>> length(iris.split)
> [1] 3
> 
>> lapply(iris.split, nrow)
> $`0`
> [1] 50
> 
> $`1`
> [1] 50
> 
> $`2`
> [1] 50
> 
> 
>> lapply(iris.split, head)
> $`0`
>  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> 1          5.1         3.5          1.4         0.2  setosa
> 2          4.9         3.0          1.4         0.2  setosa
> 3          4.7         3.2          1.3         0.2  setosa
> 4          4.6         3.1          1.5         0.2  setosa
> 5          5.0         3.6          1.4         0.2  setosa
> 6          5.4         3.9          1.7         0.4  setosa
> 
> $`1`
>   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
> 51          7.0         3.2          4.7         1.4 versicolor
> 52          6.4         3.2          4.5         1.5 versicolor
> 53          6.9         3.1          4.9         1.5 versicolor
> 54          5.5         2.3          4.0         1.3 versicolor
> 55          6.5         2.8          4.6         1.5 versicolor
> 56          5.7         2.8          4.5         1.3 versicolor
> 
> $`2`
>    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
> 101          6.3         3.3          6.0         2.5 virginica
> 102          5.8         2.7          5.1         1.9 virginica
> 103          7.1         3.0          5.9         2.1 virginica
> 104          6.3         2.9          5.6         1.8 virginica
> 105          6.5         3.0          5.8         2.2 virginica
> 106          7.6         3.0          6.6         2.1 virginica
> 
> 
> 
> Regards,
> 
> Marc Schwartz
>