[RsR] Case where options in mad() seemingly have no effect
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Wed Jul 2 12:51:27 CEST 2025
>>>>> John Buggeln
>>>>> on Tue, 1 Jul 2025 11:39:34 -0400 writes:
> Hello!
> I have what seems to be a quick question. Below I have pasted the code for the mad() function, from “package::stats”.
> function (x, center = median(x), constant = 1.4826, na.rm = FALSE,
> low = FALSE, high = FALSE)
> {
> if (na.rm)
> x <- x[!is.na(x)]
> n <- length(x)
> constant * if ((low || high) && n%%2 == 0) {
> if (low && high)
> stop("'low' and 'high' cannot be both TRUE")
> n2 <- n%/%2 + as.integer(high)
> sort(abs(x - center), partial = n2)[n2]
> }
> else median(abs(x - center))
> }
> The documentation states that the options “low” and “high”
> update the way the median is being calculated.
Yes, but *not* the "inner" median(x) which is the default for `center`,
but rather the "outer" median( |x - center| ).
> However, in the code there is no update to “center" from the value of low or high. What am I missing here? There must be something fundamental I don’t understand about R implementing this function.
Yes. I think you just need to read the help file more carefully.
If you think the help file was not clear, can you propose an
improvement there ?
> I also have a small example where the options seemingly have to effect on the MAD calculation.
Yes, in that case there is no effect (of course, see below)
> ###################VERSION INFO#############################
> R 4.5.0 on MacOSX 14.5 (M2 Apple Silicon)
> Sys.getlocale() is "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8”
> Information on package ‘stats’
> Description:
..............
..............
... the R version 4.5.0 is all that counts; 'stats' is part of R !
> ###################EXAMPLE#############################
>> var <-scan(text="3.2 3.6 4.3 4.3 4.4 4.7 5.1 5.3 5.5 5.8 6.7 8.9")
> Read 12 items
>> mad(var)
> [1] 0.88956
>> mad(var, high=TRUE)
> [1] 0.88956
>> mad(var, low=TRUE)
> [1] 0.88956
Indeed, in this case, the outer lo-median, median, and hi-median
are identical (using 'x' instead of 'var' (which is after all
the var() function)) :
##--------------o<------------------o<------------------o<-----------
x <- c(3.2, 3.6, 4.3, 4.3, 4.4, 4.7, 5.1, 5.3, 5.5, 5.8, 6.7, 8.9)
mad(x) # 0.88956
mad(x, high=TRUE) # 0.88956
mad(x, low =TRUE) # 0.88956
## MM: the same, yes, because in this case,
sort(abs(x - median(x)))
## [1] 0.2 0.2 0.4 0.5 0.6 0.6 0.6 0.9 1.3 1.7 1.8 4.0
## 1 2 3 4 5 6 7 8 .. 12
## the median(), lo-median() and hi-median {w/o the constant factor}
## are all == 0.6
## MM: slightly more interesting (after all this is about robustness !
x <- round(10*x)
x[1:3] <- 1000 + 1:3
dput(x) # c(1001, 1002, 1003, 43, 44, 47, 51, 53, 55, 58, 67, 89)
## and now,
cbind(mads <- c(lomad = mad(x, low=TRUE), mad = mad(x), himad = mad(x, high = TRUE))) / 1.4826
## give
## lomad 10.5
## mad 11.5
## himad 12.5
##--------------o<------------------o<------------------o<-----------
With best regards,
Martin
> This is confusing because when calculated manually the MAD is different:
This is not "the MAD" that we have defined; rather you use
different *center*, i.e. lo_median, median, or hi_median for the __center__
>> # Manually
>> median(abs(var-median(var)))*1.4826
> [1] 0.88956
>> median(abs(var-4.7))*1.4826 # LOW for median
> [1] 1.03782
>> median(abs(var-5.1))*1.4826 # HIGH for median
> [1] 1.11195
> Best,
> John Buggeln, MS
> PhD Candidate
> University of Delaware
> _______________________________________________
> R-SIG-Robust using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-robust
More information about the R-SIG-Robust
mailing list