[R] Split data frame into 250-row chunks

David Winsemius dwinsemius at comcast.net
Wed Jun 10 21:33:22 CEST 2015


On Jun 10, 2015, at 12:18 PM, David Winsemius wrote:

> 
> On Jun 10, 2015, at 5:39 AM, Liz Hare wrote:
> 
>> Hi R-Experts,
>> 
>> I have a data.frame like this:
>> 
>>> head(map)
>> chr snp   poscm   posbp    dist
>> 1   1  M1 2.99043 3249189      NA
>> 2   1  M2 3.06457 3273096 0.07414
>> 3   1  M3 3.17018 3307151 0.10561
>> 4   1  M4 3.20892 3319643 0.03874
>> 5   1  M5 3.28120 3342947 0.07228
>> 6   1  M6 3.29624 3347798 0.01504
>> 
>> I need to split this into chunks of 250 rows (there will usually be a last chunk with < 250 rows).
> 
> split( map, trunc( 0:(nrow(map)-1 )/nrow(map) ) )
> 
> Untested. Designed to return a list with indices starting at "0".

Looking at Marc Schwartz' answer ( a smarter man than I) I see this should have been:

split( map, trunc( 0:(nrow(map)-1 )/250) )

-- 
David.

> 
>> trunc( 0:19/5)
> [1] 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
> 
> 
> 
>> 
>> If I only had to extract one 250-line chunk, it would be easy:
>> 
>> map1 <- map[1:250, ]
>> 
>> or using subset().
>> 
>> I tried to make it a loop iterating through num and using beg and nd for starting and ending indices, but I couldn’t figure out how to reference all the variables I needed in this:
>> 
>>> chunks
>>   beg   nd let num
>> 1     1  250   a   1
>> 2   251  500   b   2
>> 3   501  750   c   3
>> 4   751 1000   d   4
>> 5  1001 1250   e   5
>> 6  1251 1500   f   6
>> 7  1501 1750   g   7
>> 8  1751 2000   h   8
>> 9  2001 2250   i   9
>> 10 2251 2500   j  10
>>>> 
>> Remembering that loops are not always the best answer in R, I looked at other options like split, following this example but not being able to adapt it from a vector to a data.frame version
>> http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r <http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r> (Yes, I’ve reviewed the language documentation). I checked out ddply and data.table, but couldn’t find a way to use them with index positions instead of column values.
>> 
>> Thanks,
>> Liz
>> 
>> 
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list