[R] Splitting a DF into rows according to a column
Johannes Graumann
johannes_graumann at web.de
Mon Oct 4 16:57:56 CEST 2010
Hi,
I'm turning my wheels on this and keep coming around to the same wrong
solution - please have a look and give a hand ...
The premise is: a DF like so
> loremIpsum <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Quisque leo ipsum, ultricies scelerisque volutpat non, volutpat et nulla.
Curabitur consequat ullamcorper tellus id imperdiet. Duis semper malesuada
nulla, blandit lobortis diam fringilla at. Vestibulum nec tellus orci, eu
sollicitudin quam. Phasellus sit amet enim diam. Phasellus mattis hendrerit
varius. Curabitur ut tristique enim. Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Sed convallis, tortor id vehicula facilisis, nunc justo
facilisis tellus, sed eleifend nisi lacus id purus. Maecenas tempus
sollicitudin libero, molestie laoreet metus dapibus eu. Mauris justo ante,
mattis et pulvinar a, varius pretium eros. Curabitur fringilla dui ac dui
rutrum pretium. Donec sed magna adipiscing nisi accumsan congue sed ac est.
Vivamus lorem urna, tristique quis accumsan quis, ullamcorper aliquet
velit."
> tmpDF <- data.frame(Column1=rep(unlist(strsplit(loremIpsum,"
")),length.out=510),Column2=runif(510,min=0,max=1e8))
is to be split into DFs with 50 entries in an ordered manner according to
column2 (first DF ist o contain the rows with the 50 largest numbers, ...).
Here is what I have been doing:
> binSize <- 50
> splitMembership <-
pmin(ceiling(order(tmpDF[["Column2"]],decreasing=TRUE)/binSize),floor(nrow(tmpDF)/binSize))
> splitList <- split(tmpDF,splitMembership)
Distribution seems to work ...
> sapply(splitList,nrow)
But this is NOT what I wanted ...
> sapply(splitList,function(x){max(x[["Column2"]])})
This was supposed to give me bins that are Column2-sorted and bin one should
have a higher max than 2 than 3 ...
Can anyone point out where (my now 3 reimplementations) fail?
Thanks, Stupid Joh
More information about the R-help
mailing list