[R] Split data frame into 250-row chunks
Marc Schwartz
marc_schwartz at me.com
Wed Jun 10 21:23:57 CEST 2015
> On Jun 10, 2015, at 2:21 PM, Marc Schwartz <marc_schwartz at me.com> wrote:
>
>
>> On Jun 10, 2015, at 7:39 AM, Liz Hare <doggene at earthlink.net> wrote:
>>
>> Hi R-Experts,
>>
>> I have a data.frame like this:
>>
>>> head(map)
>> chr snp poscm posbp dist
>> 1 1 M1 2.99043 3249189 NA
>> 2 1 M2 3.06457 3273096 0.07414
>> 3 1 M3 3.17018 3307151 0.10561
>> 4 1 M4 3.20892 3319643 0.03874
>> 5 1 M5 3.28120 3342947 0.07228
>> 6 1 M6 3.29624 3347798 0.01504
>>
>> I need to split this into chunks of 250 rows (there will usually be a last chunk with < 250 rows).
>>
>> If I only had to extract one 250-line chunk, it would be easy:
>>
>> map1 <- map[1:250, ]
>>
>> or using subset().
>>
>> I tried to make it a loop iterating through num and using beg and nd for starting and ending indices, but I couldn’t figure out how to reference all the variables I needed in this:
>>
>>> chunks
>> beg nd let num
>> 1 1 250 a 1
>> 2 251 500 b 2
>> 3 501 750 c 3
>> 4 751 1000 d 4
>> 5 1001 1250 e 5
>> 6 1251 1500 f 6
>> 7 1501 1750 g 7
>> 8 1751 2000 h 8
>> 9 2001 2250 i 9
>> 10 2251 2500 j 10
>> …
>>
>> Remembering that loops are not always the best answer in R, I looked at other options like split, following this example but not being able to adapt it from a vector to a data.frame version
>> http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r <http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r> (Yes, I’ve reviewed the language documentation). I checked out ddply and data.table, but couldn’t find a way to use them with index positions instead of column values.
>>
>> Thanks,
>> Liz
>
>
> Hi,
>
> map.split <- split(x, (as.numeric(rownames(map)) - 1) %/% 250)
Shoot, typo in the above, it should be ‘map’, not ‘x’:
map.split <- split(map, (as.numeric(rownames(map)) - 1) %/% 250)
Marc
>
> That will create a list of data frames comprised of subsets of ‘map’, each of which will have 250 records except, of course, for the last one.
>
> Essentially, you are creating a grouping variable based upon the numeric row names modulo the length of the chunks that you want. For example, using the built-in ‘iris’ dataset, which has 150 rows:
>
>> (as.numeric(rownames(iris)) - 1) %/% 50
> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> [34] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> [67] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> [100] 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> [133] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>
> iris.split <- split(iris, (as.numeric(rownames(iris)) - 1) %/% 50)
>
>> length(iris.split)
> [1] 3
>
>> lapply(iris.split, nrow)
> $`0`
> [1] 50
>
> $`1`
> [1] 50
>
> $`2`
> [1] 50
>
>
>> lapply(iris.split, head)
> $`0`
> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> 1 5.1 3.5 1.4 0.2 setosa
> 2 4.9 3.0 1.4 0.2 setosa
> 3 4.7 3.2 1.3 0.2 setosa
> 4 4.6 3.1 1.5 0.2 setosa
> 5 5.0 3.6 1.4 0.2 setosa
> 6 5.4 3.9 1.7 0.4 setosa
>
> $`1`
> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> 51 7.0 3.2 4.7 1.4 versicolor
> 52 6.4 3.2 4.5 1.5 versicolor
> 53 6.9 3.1 4.9 1.5 versicolor
> 54 5.5 2.3 4.0 1.3 versicolor
> 55 6.5 2.8 4.6 1.5 versicolor
> 56 5.7 2.8 4.5 1.3 versicolor
>
> $`2`
> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> 101 6.3 3.3 6.0 2.5 virginica
> 102 5.8 2.7 5.1 1.9 virginica
> 103 7.1 3.0 5.9 2.1 virginica
> 104 6.3 2.9 5.6 1.8 virginica
> 105 6.5 3.0 5.8 2.2 virginica
> 106 7.6 3.0 6.6 2.1 virginica
>
>
>
> Regards,
>
> Marc Schwartz
>
More information about the R-help
mailing list