[R] Binning the data based on a value

R. Michael Weylandt michael.weylandt at gmail.com
Mon Dec 5 22:09:10 CET 2011


Add the drop = TRUE command to split

?split
split(a, cut(a$spending, breaks = (0:5)*100), drop = TRUE)

Michael

On Mon, Dec 5, 2011 at 4:06 PM, Diviya Smith <diviya.smith at gmail.com> wrote:
> Thank you very much Michael. This is very helpful. However, if there is any
> way to exclude zero length bins. Lets imagine that the matrix was as follows
> -
>
> a <- data.frame(patient=1:7, charges=c(100,500,200,90,400,500,600),
>  age=c(0,3,5,7,10,16,19),  spending=c(10, 60, 110, 200, 400, 450, 500))
>
> bins <- split(a, cut(a$spending, breaks = (0:5)*100)
>
> then bins[3] =
> $`(200,300]`
> [1] patient  charges  age      spending
> <0 rows> (or 0-length row.names)
>
> Is there a way to exclude this?
>
> Priya
>
> On Mon, Dec 5, 2011 at 3:43 PM, R. Michael Weylandt
> <michael.weylandt at gmail.com> wrote:
>>
>> Just a clarification: I can't get round to work as I first expected so
>> if you want to do bins by 100's you'd probably want:
>>
>> split(a, cut(a$spending, breaks = (0:5)*100))
>>
>> Michael
>>
>> On Mon, Dec 5, 2011 at 3:41 PM, R. Michael Weylandt
>> <michael.weylandt at gmail.com> wrote:
>> > I'd so something like
>> >
>> > split(a, a$spending)
>> >
>> > and you can include a round(a$spending, -2) or something similar if
>> > you want to group by the 100's.
>> >
>> > Michael
>> >
>> > On Mon, Dec 5, 2011 at 3:37 PM, Diviya Smith <diviya.smith at gmail.com>
>> > wrote:
>> >> Hello there,
>> >>
>> >> I have a matrix with some data and I want to split this matrix based on
>> >> the
>> >> values in one column. Is there a quick way of doing this? I have looked
>> >> at
>> >> cut but I am not sure how to exactly use it?
>> >> for example:
>> >>
>> >> I would like to split the matrix "a" based on the spending such that
>> >> the
>> >> data is binned groups [0..99],[100..199]...and so on.
>> >>
>> >> a <- data.frame(patient=1:7, charges=c(100,500,200,90,400,500,600),
>> >>  age=c(0,3,5,7,10,16,19),  spending=c(10, 60, 110, 200, 250, 400, 450))
>> >>
>> >> Expected  output -
>> >> bin[1] <- c(10, 60)
>> >> bin[2] <- c(110, 200, 250)
>> >> bin[3] <- c(400, 450)
>> >>
>> >> NOTE that the number of data points in each bin is not the same and the
>> >> empty bins are removed (since there are no points between [199..299],
>> >> bin[3] starts at 400.
>> >>
>> >> Any help would be most appreciated. Thank you in advance.
>> >>
>> >> Diviya
>> >>
>> >>        [[alternative HTML version deleted]]
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list