[R] Calculate the median age interval

(Ted Harding) Ted.Harding at wlandres.net
Mon Jan 12 12:21:14 CET 2015


Sorry, a typo in my reply below. See at "#######".

On 12-Jan-2015 11:12:43 Ted Harding wrote:
> On 12-Jan-2015 10:32:41 Erik B Svensson wrote:
>> Hello
>> I've got a problem I don't know how to solve. I have got a dataset that
>> contains age intervals (age groups) of people and the number of persons in
>> each age group each year (y1994-y1996). The number of persons varies each
>> year. I only have access to the age intervals, not the age of each person,
>> which would make things easier.
>> 
>> I want to know the median age interval (not the median number) for each
>> year. Let's say that in y1994 23 corresponds to the median age interval
>> "45-54", I want to "45-54" as a result. How is that done?
>> 
>> This is the sample dataset:
>> 
> agegrp <-
> c("<1","1-4","5-14","15-24","25-34","35-44","45-54","55-64","65-74",
>   "75-84","84-")
>   y1994 <- c(0,5,7,9,25,44,23,32,40,36,8)
>   y1995 <- c(2,4,1,7,20,39,32,18,21,23,5)
>   y1996 <- c(1,3,1,4,22,37,41,24,24,26,8)
> 
>> I look forward to your response
>> 
>> Best regards,
>> Erik Svensson
> 
> In principle, this is straightforward. But in ##############practice you may
> need to be careful about how to deal with borderline cases -- and
> about what you mean by "median age interval".
> The underlying idea is based on:
> 
>  cumsum(y1994)/sum(y1994)
>  # [1] 0.00000000 0.02183406 0.05240175 0.09170306 0.20087336
>  # [6] 0.39301310 0.49344978 0.63318777 0.80786026 0.96506550 1.00000000
> 
> Thus age intervals 1-7 ("<1" - "45-64") contain less that 50%
> (0.49344978...), though "45-64" almost gets there. However,
> age groups 1-8 ("<1" - 55-64" contain more than 50%. Hence
> the median age is within "49-64".
####### Should be:
  age groups 1-8 ("<1" - 55-64") contain more than 50%. Hence
  the median age is within "55-64".

> Implementing the above as a procedure:
> 
>   agegrp[max(which(cumsum(y1994)/sum(y1994)<0.5)+1)]
>   # [1] "55-64"
> 
> Note that the "obvious solution":
> 
>   agegrp[max(which(cumsum(y1994)/sum(y1994) <= 0.5))]
>   # [1] "45-54"
> 
> gives an incorrect answer, since with these data it returns a group
> whose maximum age is below the median. This is because the "<=" is
> satisfied by "<" also.
> 
> Hoping this helps!
> Ted.
> 
> -------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
> Date: 12-Jan-2015  Time: 11:12:39
> This message was sent by XFMail
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 12-Jan-2015  Time: 11:21:11
This message was sent by XFMail



More information about the R-help mailing list