[R] Splitting a DF into rows according to a column

Johannes Graumann johannes_graumann at web.de
Mon Oct 4 16:57:56 CEST 2010


Hi,

I'm turning my wheels on this and keep coming around to the same wrong 
solution - please have a look and give a hand ...

The premise is: a DF like so

> loremIpsum <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
Quisque leo ipsum, ultricies scelerisque volutpat non, volutpat et nulla. 
Curabitur consequat ullamcorper tellus id imperdiet. Duis semper malesuada 
nulla, blandit lobortis diam fringilla at. Vestibulum nec tellus orci, eu 
sollicitudin quam. Phasellus sit amet enim diam. Phasellus mattis hendrerit 
varius. Curabitur ut tristique enim. Lorem ipsum dolor sit amet, consectetur 
adipiscing elit. Sed convallis, tortor id vehicula facilisis, nunc justo 
facilisis tellus, sed eleifend nisi lacus id purus. Maecenas tempus 
sollicitudin libero, molestie laoreet metus dapibus eu. Mauris justo ante, 
mattis et pulvinar a, varius pretium eros. Curabitur fringilla dui ac dui 
rutrum pretium. Donec sed magna adipiscing nisi accumsan congue sed ac est. 
Vivamus lorem urna, tristique quis accumsan quis, ullamcorper aliquet 
velit."
> tmpDF <- data.frame(Column1=rep(unlist(strsplit(loremIpsum," 
")),length.out=510),Column2=runif(510,min=0,max=1e8))

is to be split into DFs with 50 entries in an ordered manner according to 
column2 (first DF ist o contain the rows with the 50 largest numbers, ...).

Here is what I have been doing:

> binSize <- 50
> splitMembership <- 
pmin(ceiling(order(tmpDF[["Column2"]],decreasing=TRUE)/binSize),floor(nrow(tmpDF)/binSize))
> splitList <- split(tmpDF,splitMembership)

Distribution seems to work ...
> sapply(splitList,nrow)

But this is NOT what I wanted ...
> sapply(splitList,function(x){max(x[["Column2"]])})
This was supposed to give me bins that are Column2-sorted and bin one should 
have a higher max than 2 than 3 ...

Can anyone point out where (my now 3 reimplementations) fail?

Thanks, Stupid Joh



More information about the R-help mailing list