[R] I don't understand the result of `svyboxplot` function from the Survey package

Thomas Lumley t@|um|ey @end|ng |rom @uck|@nd@@c@nz
Tue Aug 2 23:47:02 CEST 2022

It's a bug resulting from the new svyquantile() implementation.  It's fixed in the development version, which you can get from r-forge here: https://r-forge.r-project.org/R/?group_id=1788


Thomas Lumley
Professor of Biostatistics

From: Anthony Damico <ajdamico using gmail.com>
Sent: Tuesday, August 2, 2022 11:49 PM
To: Iulia Dumitru
Cc: r-help using r-project.org; Thomas Lumley
Subject: Re: [R] I don't understand the result of `svyboxplot` function from the Survey package

hi, nice catch!  i'm ccing the author of the survey package because this might be a issue.  when i run ?svyboxplot, i also see all three boxes in the exact same place..  seems like the svyby() call inside of svyboxplot does something unexpected when svyquantile gets passed using keep.var=FALSE and ci=FALSE



dstrat <- svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat, fpc = ~fpc)

# looks OK
svyby(~enroll, ~stype, dstrat, svyquantile, quantiles = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)

# returns each result three times in an unexpected configuration..  svyboxplot then grabs the repeated information from the first six columns
svyby(~enroll, ~stype, dstrat, svyquantile, ci = FALSE, keep.var = FALSE, quantiles = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)

On Tue, Aug 2, 2022 at 7:30 AM Iulia Dumitru <iuliadmtru using gmail.com<mailto:iuliadmtru using gmail.com>> wrote:
After following the example given here: https://www.rdocumentation.org/packages/survey/versions/4.1-1/topics/svyhist<https://www.rdocumentation.org/packages/survey/versions/4.1-1/topics/svyhist>
for `svyboxplot` I get the result in the attached image. This is a box plot of the `enroll` variable from the stratified dataset `apistrat`, grouped by `stype`: E (elementary school), M (middle school) and H (high school). If I use the `svyby` function to group the data by `stype` and find the mean for each group, I get this result:

> svyby(~enroll, ~stype, dstrat, svymean)
  stype  enroll       se
E     E  416.78 16.41740
H     H 1320.70 91.70781
M     M  832.48 54.52157

Clearly the means are very different from each other. Then why don’t the box plots show this? I don’t know how to interpret the plot. Could someone please offer some insight on this? Thank you!

R-help using r-project.org<mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list