[R] Trying to understand cut
David Winsemius
dwinsemius at comcast.net
Sun Apr 17 18:39:45 CEST 2016
> On Apr 16, 2016, at 9:12 PM, John Sorkin <jsorkin at grecc.umaryland.edu> wrote:
>
> Jeff,
> Perhaps I was sloppy with my notation:
> I want groups
>> =0 <10
>> =10 <20
>> =20<30
> ......
>> =90 <100
>
> In any event, my question remains, why did the four different versions of cut give me the same results? I hope someone can explain to me the function of
> include.lowest and right in the call to cut. As demonstrated in my example below, the parameters do not seem to alter the results of using cut.
The pitfalls of using `cut` has pushed me toward a preference for findIntervals
nums<-1:100
table( findInterval( nums , seq(0, 100, by=10) ) )
It does mean that I often need to construct names for my groups but at least I know that I will be getting left closed intervals by default, since its `rightmost.closed`-default is FALSE. I often flank my cutting sequence with -Inf on the left and Inf on the right to know that I am seeing any outliers:
table( findInterval( nums , c(-Inf, seq(10, 90, by=10), Inf) ) ) # slightly different
--
David.
> Thank you,
> John
>
>
> P.S. How do I find FAQ 7.31?
On my machine pulling down the Help-menu and choosing "R Help", the R FAQ comes up as a link that will display the full FAQ. I thought the behavior in The Windows R-GUI might be similar, but I lost the ability to use my virtually hosted version of R in my last OS upgrade.
.
> Thank you,
> John
>
> I
>
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>>>> Jeff Newmiller <jdnewmil at dcn.davis.ca.us> 04/16/16 11:07 PM >>>
> Have you read FAQ 7.31 recently, John? Your whole premise is flawed. You should be thinking of ranges [0,10), [10,20), and so on because numbers ending in 0.9 are never going to be exact.
> --
> Sent from my phone. Please excuse my brevity.
>
>
> On April 16, 2016 7:38:50 PM PDT, John Sorkin <jsorkin at grecc.umaryland.edu> wrote:
> I am trying to understand cut so I can divide a list of numbers into 10 group:
> 0-9.0
> 10-10.9
> 20-20.9
> 30-30.9,
> 40-40.9,
> 50-50.9
> 60-60.9
> 70-70.9
> 80-80.9
> 90-90.9
>
> As I try to do this, I have been playing with the cut function. Surprising the following for applications of cut give me the exact same groups. This surprises me given that I have varied parameters include.lowest and right. Can someone help me understand what include.lowest and right do? I have looked at the help page, but I don't seem to understand what I am being told!
> Thank you,
> John
>
> values <- c((0:99),c(0.9:99.9))
> sort(values)
> c1<-cut(values,10,include.lowest=FALSE,right=TRUE)
> c2<-cut(values,10,include.lowest=FALSE,right=FALSE)
> c3<-cut(values,10,include.lowest=TRUE,right=TRUE)
> c4<-cut(values,10,include.lowest=TRUE,right=FALSE)
> cbind(min=aggregate(values,list(c1),min),max=aggregate(values,list(c1),max))
> cbind(min=aggregate(values,list(c2),min),max=aggregate(values,list(c2),max))
> cbind(min=aggregate(values,list(c3),min),max=aggregate(values,list(c3),max))
> cbind(min=aggregate(values,list(c4),min),max=aggregate(values,list(c4),max))
>
> You can run the code below, or inspect the results I got which are reproduced below:
>
> cbind(min=aggregate(values,list(c1),min),max=aggregate(values,list(c1),max))
>
> min.Group.1 min.x max.Group.1 max.x
> 1 (-0.0999,9.91] 0 (-0.0999,9.91] 9.9
> 2 (9.91,19.9] 10 (9.91,19.9] 19.9
> 3 (19.9,29.9] 20 (19.9,29.9] 29.9
> 4 (29.9,39.9] 30 (29.9,39.9] 39.9
> 5 (39.9,50] 40 (39.9,50] 49.9
> 6 (50,60] 50 (50,60] 59.9
> 7 (60,70] 60 (60,70] 69.9
> 8 (70,80] 70 (70,80] 79.9
> 9 (80,90] 80 (80,90] 89.9
> 10 (90,100] 90 (90,100] 99.9
> cbind(min=aggregate(values,list(c2),min),max=aggregate(values,list(c2),max))
>
> min.Group.1 min.x max.Group.1 max.x
> 1 [-0.0999,9.91) 0 [-0.0999,9.91) 9.9
> 2 [9.91,19.9) 10 [9.91,19.9) 19.9
> 3 [19.9,29.9) 20 [19.9,29.9) 29.9
> 4 [29.9,39.9) 30 [29.9,39.9) 39.9
> 5 [39.9,50) 40 [39.9,50) 49.9
> 6 [50,60) 50 [50,60) 59.9
> 7 [60,70) 60 [60,70) 69.9
> 8 [70,80) 70 [70,80) 79.9
> 9 [80,90) 80 [80,90) 89.9
> 10 [90,100) 90 [90,100) 99.9
> cbind(min=aggregate(values,list(c3),min),max=aggregate(values,list(c3),max))
>
> min.Group.1 min.x max.Group.1 max.x
> 1 [-0.0999,9.91] 0 [-0.0999,9.91] 9.9
> 2 (9.91,19.9] 10 (9.91,19.9] 19.9
> 3 (19.9,29.9] 20 (19.9,29.9] 29.9
> 4 (29.9,39.9] 30 (29.9,39.9] 39.9
> 5 (39.9,50] 40 (39.9,50] 49.9
> 6 (50,60] 50 (50,60] 59.9
> 7 (60,70] 60 (60,70] 69.9
> 8 (70,80] 70 (70,80] 79.9
> 9 (80,90] 80 (80,90] 89.9
> 10 (90,100] 90 (90,100] 99.9
> cbind(min=aggregate(values,list(c4),min),max=aggregate(values,list(c4),max))
>
> min.Group.1 min.x max.Group.1 max.x
> 1 [-0.0999,9.91) 0 [-0.0999,9.91) 9.9
> 2 [9.91,19.9) 10 [9.91,19.9) 19.9
> 3 [19.9,29.9) 20 [19.9,29.9) 29.9
> 4 [29.9,39.9) 30 [29.9,39.9) 39.9
> 5 [39.9,50) 40 [39.9,50) 49.9
> 6 [50,60) 50 [50,60) 59.9
> 7 [60,70) 60 [60,70) 69.9
> 8 [70,80) 70 [70,80) 79.9
> 9 [80,90) 80 [80,90) 89.9
> 10 [90,100] 90 [90,100] 99.9
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
> Confidentiality Statement:
> This email message, including any attachments, isfor t...{{dropped:29}}
More information about the R-help
mailing list