[R-sig-Geo] Calculating median age for a group of US census blocks?

Kevin Zembower kev|n @end|ng |rom zembower@org
Sat Sep 9 16:42:04 CEST 2023


Dr. Snow, thanks so much for your response to my question.

I think I'm going to stick with the lower- and upper-bounds method I
described, even though it gives a wider range for the median age than
other methods. I read the vignette for 'survival' as well at the
chapters on survival from MASS and another book I have, and couldn't
make heads or tails of it, much less how to apply it to this question.
In the unlikely event of someone asking me to explain or defend my
conclusions on median age for my neighborhood population, I would be
lost about survival statistics, but could manage, with numerous hand-
waves, to explain my method. I'm an old, retired guy who thinks
statistics are fun, not someone with any kind of professional training
or credentials.

Thank you, again, for your thoughtful and thorough response. I
appreciate your help.

-Kevin

On Tue, 2023-09-05 at 11:31 -0600, Greg Snow wrote:
> Kevin,
> 
> Your idea of substituting the minimum and maximum values of the
> ranges
> will work for computing bounds on the median age, and for the median
> age you should not need to drop the 85+ group (unless more that 50%
> of
> people are in that group).  The mean is another issue.
> 
> Another approach that may give you a smaller interval and more
> statistically justified range would be to turn to survival analysis
> techniques and treat the values from the table as interval censored
> data.  If the data appears to come from a known distribution then you
> can use parametric survival techniques to fit the distribution (see
> the `survreg` function in the `survival` package).  Or, there are
> packages that fit non-parametric models to interval censored data
> (`Icens` and `interval` for example) that can then be used to
> estimate
> a confidence interval on the median age (and possibly the mean age,
> but with limitations).  For the 85+ group you can treat them as right
> censored, or interval censored from 85 to infinity, or interval
> censored from 85 to some value like 100 or 120 (there is a small
> chance that someone in the table could be over 100, but rare, I think
> the current oldest reported living person is in the hundred and
> teens,
> so 120 would be safe).
> 
> On Thu, Aug 31, 2023 at 1:48 PM Kevin Zembower via R-sig-Geo
> <r-sig-geo using r-project.org> wrote:
> > 
> > Sorry to resurrect a long-dead thread, but I'm still struggling
> > with my
> > desire to assign a median age to the population in a group of US
> > census
> > blocks. I'm using the data from the US Census table P12, which bins
> > the
> > ages into ranges.
> > 
> > I'm convinced (thank you!) that I can't compute the exact median
> > age.
> > Can I compute the lower and upper bounds of the median age? Can I
> > assign all the people in a binned age range (say "20 to 29 years")
> > to
> > the lower limit of the range, then compute the median of those
> > ages,
> > and say that the true median age is between this lower limit and
> > the
> > upper one, computed similarly?
> > 
> > If this is valid, how do I deal with the "85 years and older" bin?
> > I
> > have 9 people 85 years and older, out of a total population of 537
> > people in my group of census blocks. For the lower bounds of the
> > median, I assign all 9 the age of 85. What can I do for the upper
> > bounds?
> > 
> > I've done this, and found that the true median age is between 40
> > and 44
> > years old, if I drop all the "85 years and older" population as NA.
> > The
> > true mean is between 39.96 and 43.46, similarly.
> > 
> > One thought: If there are 9 people in the "85 years and older"
> > group,
> > should I drop them and also drop the 9 youngest ages?
> > 
> > I look forward to reading your thoughts. Thank you for any advice
> > and
> > guidance.
> > 
> > -Kevin
> > 
> > On Tue, 2023-08-08 at 12:00 +0200, r-sig-geo-request using r-project.org
> > wrote:
> > > 
> > > Message: 2
> > > Date: Mon, 7 Aug 2023 18:33:41 +0000
> > > From: Kevin Zembower <kevin using zembower.org>
> > > To: "r-sig-geo using r-project.org" <r-sig-geo using r-project.org>
> > > Subject: [R-sig-Geo] Calculating median age for a group of US
> > > census
> > >         blocks?
> > > Message-ID:
> > >         <01000189d146bd0d-ecb41aac-0501-46f4-b313-a1faebeff2a9-
> > > 000000 using email.amazonses.com>
> > > 
> > > Content-Type: text/plain; charset="utf-8"
> > > 
> > > Hello, all,
> > > 
> > > I'd like to obtain the median age for a population in a specific
> > > group
> > > of US Decennial census blocks. Here's an example of the problem:
> > > 
> > > ## Example of calculating median age of population in census
> > > blocks.
> > > library(tidyverse)
> > > library(tidycensus)
> > > 
> > > counts <- get_decennial(
> > >      geography = "block",
> > >      state = "MD",
> > >      county = "Baltimore city",
> > >      table = "P1",
> > >      year = 2020,
> > >      sumfile = "dhc") %>%
> > >      mutate(NAME = NULL) %>%
> > >      filter(substr(GEOID, 6, 11) == "271101" &
> > >             substr(GEOID, 12, 15) %in% c(3000, 3001, 3002)
> > >             )
> > > 
> > > ages <- get_decennial(
> > >      geography = "block",
> > >      state = "MD",
> > >      county = "Baltimore city",
> > >      table = "P13",
> > >      year = 2020,
> > >      sumfile = "dhc") %>%
> > >      mutate(NAME = NULL) %>%
> > >      filter(substr(GEOID, 6, 11) == "271101" &
> > >             substr(GEOID, 12, 15) %in% c(3000, 3001, 3002)
> > >             )
> > > 
> > > I have two questions:
> > > 
> > > 1. Is it mathematically valid to multiply the population of a
> > > block
> > > by
> > > the median age of that block (in other words, assign the median
> > > age
> > > to
> > > each member of a block), then calculate the median of those
> > > numbers
> > > for
> > > a group of blocks?
> > > 
> > > 2. Is raw data on the ages of individuals available anywhere else
> > > in
> > > the
> > > census data? I can find tables such as P12, that breaks down the
> > > population by age ranges or bins, but can't find specific data of
> > > counts
> > > per age in years.
> > > 
> > > Thanks for your advice and help.
> > > 
> > > -Kevin
> > > 
> > > 
> > > 
> > > 
> > > ------------------------------
> > > 
> > > Message: 3
> > > Date: Mon, 7 Aug 2023 14:38:16 -0400
> > > From: Josiah Parry <josiah.parry using gmail.com>
> > > To: Kevin Zembower <kevin using zembower.org>
> > > Cc: "r-sig-geo using r-project.org" <r-sig-geo using r-project.org>
> > > Subject: Re: [R-sig-Geo]  Calculating median age for a group of
> > > US
> > >         census blocks?
> > > Message-ID:
> > >         <
> > > CAL3ufUJVvcZvdtYM2V0tmo9U-RMZ1zOGL8NZDhjK7V8GFc77HA using mail.gmail.com
> > > >
> > > Content-Type: text/plain; charset="utf-8"
> > > 
> > > Hey Kevin, I don't think you're going to be able to get
> > > individual
> > > level
> > > data from the US Census Bureau. The closest you may be able to
> > > get is
> > > the
> > > current population survey (CPS) which I believe is also available
> > > via
> > > tidycensus. Regarding your first question, I'm not sure I follow
> > > what
> > > your
> > > objective is with it. I would use a geography of census block
> > > groups
> > > as the
> > > measure of median for census block groups. Otherwise it is
> > > unclear
> > > how you
> > > are defining what a "group of blocks" is.
> > > 
> > > ------------------------------
> > > 
> > > Message: 4
> > > Date: Mon, 7 Aug 2023 19:00:38 +0000
> > > From: Kevin Zembower <kevin using zembower.org>
> > > To: Josiah Parry <josiah.parry using gmail.com>
> > > Cc: "r-sig-geo using r-project.org" <r-sig-geo using r-project.org>
> > > Subject: Re: [R-sig-Geo]  Calculating median age for a group of
> > > US
> > >         census blocks?
> > > Message-ID:
> > >         <01000189d15f6aa3-d32ffe39-a210-436f-9f8f-cc551370f034-
> > > 000000 using email.amazonses.com>
> > > 
> > > Content-Type: text/plain; charset="utf-8"
> > > 
> > > Josiah, thanks for your reply.
> > > 
> > > Regarding my objective, I'm trying to compile census statistics
> > > for
> > > the
> > > blocks that make up the neighborhood where I live. It consists of
> > > ten
> > > census blocks, of which I selected three for simplicity in my
> > > example.
> > > The census block-group which contains these ten blocks also
> > > contains
> > > some blocks which are outside of my neighborhood and shouldn't be
> > > counted or included.
> > > 
> > > Since I won't be able to calculate the median age from the age
> > > and
> > > count
> > > data, and since the individual data doesn't seem to be available,
> > > is
> > > it
> > > your thought that I can't produce a valid median age for a group
> > > of
> > > census blocks?
> > > 
> > > Thanks so much for your advice.
> > > 
> > > -Kevin
> > > 
> > > ------------------------------
> > > 
> > > Message: 5
> > > Date: Mon, 7 Aug 2023 18:45:48 +0000
> > > From: Sean Trende <strende using realclearpolitics.com>
> > > To: Josiah Parry <josiah.parry using gmail.com>, Kevin Zembower
> > >         <kevin using zembower.org>
> > > Cc: "r-sig-geo using r-project.org" <r-sig-geo using r-project.org>
> > > Subject: Re: [R-sig-Geo]  Calculating median age for a group of
> > > US
> > >         census blocks?
> > > Message-ID:
> > >         <
> > > BLAPR20MB39382F6CD501D6B1ED8F2C11BE0CA using BLAPR20MB3938.namprd20.prod.ou
> > > tlook.com>
> > > 
> > > Content-Type: text/plain; charset="utf-8"
> > > 
> > > This is correct on the second question, at least for more recent
> > > censuses.  On the first question, imagine a block where the ages
> > > of
> > > three individuals are 60, 50, and 40, and another one where the
> > > ages
> > > are 20, 20, and 20.  Using your approach you would have 50 * 3 =
> > > 150
> > > for the first block, and 20*3 = 60 for the second block.  The
> > > median
> > > of 60 and 150 is 105.  Even dividing that by three you get 35,
> > > which
> > > is not the correct median age (30).
> > > 
> > > ------------------------------
> > > 
> > > Message: 6
> > > Date: Mon, 7 Aug 2023 18:52:33 +0000
> > > From: Kevin Zembower <kevin using zembower.org>
> > > To: Sean Trende <strende using realclearpolitics.com>,  Josiah Parry
> > >         <josiah.parry using gmail.com>
> > > Cc: "r-sig-geo using r-project.org" <r-sig-geo using r-project.org>
> > > Subject: Re: [R-sig-Geo]  Calculating median age for a group of
> > > US
> > >         census blocks?
> > > Message-ID:
> > >         <01000189d1580211-8b8fa766-f820-4ae9-862b-e98e1a4881bf-
> > > 000000 using email.amazonses.com>
> > > 
> > > Content-Type: text/plain; charset="utf-8"
> > > 
> > > Yes, I see what you mean:
> > > 
> > >  > median(c(60, 50, 40, 20, 20, 20))
> > > [1] 30
> > >  > median(c(50, 50, 50, 20, 20, 20))
> > > [1] 35
> > >  >
> > > 
> > > Thanks so much for that clear example.
> > > 
> > > -Kevin
> > > 
> > > ------------------------------
> > > 
> > > Message: 7
> > > Date: Mon, 7 Aug 2023 18:53:05 +0000
> > > From: Jeff Boggs <jboggs using brocku.ca>
> > > To: "r-sig-geo using r-project.org" <r-sig-geo using r-project.org>, Kevin
> > >         Zembower <kevin using zembower.org>
> > > Subject: Re: [R-sig-Geo]  Calculating median age for a group of
> > > US
> > >         census blocks?
> > > Message-ID:
> > >         <
> > > YT3PR01MB91703A158414A8F28FB4052FC00CA using YT3PR01MB9170.CANPRD01.PROD.OU
> > > TLOOK.COM>
> > > 
> > > Content-Type: text/plain; charset="us-ascii"
> > > 
> > > Responses to your questions:
> > > Q1: No. It is not mathematically valid, sadly.
> > > 
> > > Q2: I do not know, but your intuition that this is a possible
> > > solution is correct.
> > > 
> > > I don't use US Census data anymore, but suspect that the data
> > > exists.
> > > Whether they are publicly-available is a different question. I
> > > suspect, though, that block level age-sex cohort in five-year
> > > intervals is available, given this is the usual ingredient for a
> > > population pyramid. That data could be used to calculate a less
> > > exact
> > > median, if you make some simplifying assumptions.
> > > 
> > > Best regards,
> > > Jeff
> > > 
> > > ------------------------------
> > > 
> > > Message: 8
> > > Date: Mon, 7 Aug 2023 15:43:50 -0400
> > > From: Dexter Locke <dexter.locke using gmail.com>
> > > To: Kevin Zembower <kevin using zembower.org>
> > > Cc: Josiah Parry <josiah.parry using gmail.com>, 
> > > "r-sig-geo using r-project.org"
> > >         <r-sig-geo using r-project.org>
> > > Subject: Re: [R-sig-Geo]  Calculating median age for a group of
> > > US
> > >         census blocks?
> > > Message-ID:
> > >         <
> > > CAA=SVwHn=92B-k1tBZm2ioEW79gJx_QX0VD-x2UUEQOBQ+TEvg using mail.gmail.com
> > > >
> > > Content-Type: text/plain; charset="utf-8"
> > > 
> > > Hi Kevin and all,
> > > 
> > > Given the binned data, you could count the number of people per
> > > age
> > > class
> > > for those 10 blocks. You can then express that in a number of
> > > different ways, like percent under 25 years old, or by
> > > calculating
> > > the
> > > dependency
> > > ratio
> > > <
> > > https://www.who.int/data/gho/indicator-metadata-registry/imr-details/1
> > > 119#:~:text=Definition%3A,a%20specific%20point%20in%20time.>
> > > .
> > > 
> > > I do think it is feasible to calculate an estimated mean from the
> > > counts
> > > within groups representing ranges. See, for example, here:
> > > https://stackoverflow.com/questions/18887382/how-to-calculate-the-median-on-grouped-dataset
> > > 
> > > Since you are working in Baltimore, you may consider looking at
> > > The
> > > Baltimore Neighborhood Indicators Alliance
> > > https://bniajfi.org/vital_signs/.
> > > They provide useful data on a range of issues (transportation,
> > > crime,
> > > education, environment etc.) including summaries from Census-
> > > derived
> > > demographics. What you are seeking may already exist. BNIA
> > > creates
> > > neighborhoods or "community statistical areas" (n=55) based on
> > > aggregates
> > > of Census data.
> > > 
> > > Although not pertaining to age, Baltimore City Planning has paid
> > > Census in
> > > the past to aggregate from individual-level Census data to the
> > > more
> > > colloquially-used definitions of Baltimore shown here (n = 273):
> > > https://data.baltimorecity.gov/datasets/neighborhood-1/explore?location=39.284832%2C-76.620516%2C12.91
> > > 
> > > Best, Dexter
> > > https://dexterlocke.com/
> > > 
> > > 
> > 
> > 
> > 
> > _______________________________________________
> > R-sig-Geo mailing list
> > R-sig-Geo using r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> 
> 
> 





More information about the R-sig-Geo mailing list