[RsR] Case where options in mad() seemingly have no effect

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Wed Jul 2 12:51:27 CEST 2025


>>>>> John Buggeln 
>>>>>     on Tue, 1 Jul 2025 11:39:34 -0400 writes:

    > Hello!
    > I have what seems to be a quick question. Below I have pasted the code for the mad() function, from “package::stats”. 

    > function (x, center = median(x), constant = 1.4826, na.rm = FALSE, 
    > 	        low = FALSE, high = FALSE) 
    > {
    >   if (na.rm) 
    >      x <- x[!is.na(x)]
    >   n <- length(x)
    >   constant * if ((low || high) && n%%2 == 0) {
    >      if (low && high) 
    >        stop("'low' and 'high' cannot be both TRUE")
    >      n2 <- n%/%2 + as.integer(high)
    >      sort(abs(x - center), partial = n2)[n2]
    >   }
    >   else median(abs(x - center))
    > }

    > The documentation states that the options “low” and “high”
    > update the way the median is being calculated.

Yes, but *not* the "inner" median(x) which is the default for `center`, 
but rather the "outer" median( |x - center| ).

> However, in the code there is no update to “center" from the value of low or high. What am I missing here? There must be something fundamental I don’t understand about R implementing this function. 

Yes.  I think you just need to read the help file more carefully.

If you think the help file was not clear, can you propose an
improvement there ?

    > I also have a small example where the options seemingly have to effect on the MAD calculation.

Yes, in that case there is no effect (of course, see below)

    > ###################VERSION INFO#############################

    > R 4.5.0 on MacOSX 14.5 (M2 Apple Silicon)
    > Sys.getlocale() is "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8”

    > Information on package ‘stats’

    > Description:

	..............
	..............


... the R version  4.5.0   is all that counts;  'stats' is part of R !

    > ###################EXAMPLE#############################

    >> var <-scan(text="3.2 3.6 4.3 4.3 4.4 4.7 5.1 5.3 5.5 5.8 6.7 8.9")
    > Read 12 items
    >> mad(var) 
    > [1] 0.88956
    >> mad(var, high=TRUE)
    > [1] 0.88956
    >> mad(var, low=TRUE)
    > [1] 0.88956

Indeed, in this case, the outer lo-median, median, and hi-median
are identical (using 'x' instead of 'var' (which is after all
the var() function)) :


##--------------o<------------------o<------------------o<-----------
x <- c(3.2, 3.6, 4.3, 4.3, 4.4, 4.7, 5.1, 5.3, 5.5, 5.8, 6.7, 8.9)
mad(x)            # 0.88956
mad(x, high=TRUE) # 0.88956
mad(x, low =TRUE) # 0.88956

## MM:  the same, yes, because in this case,
sort(abs(x - median(x)))
## [1] 0.2 0.2 0.4 0.5 0.6 0.6 0.6 0.9 1.3 1.7 1.8 4.0
##      1   2   3   4   5   6   7   8   ..          12
## the median(), lo-median() and hi-median  {w/o the constant factor}
## are all == 0.6

## MM: slightly more interesting (after all this is about robustness !
x <- round(10*x)
x[1:3] <- 1000 + 1:3
dput(x) # c(1001, 1002, 1003, 43, 44, 47, 51, 53, 55, 58, 67, 89)
## and now,
cbind(mads <- c(lomad = mad(x, low=TRUE), mad = mad(x), himad = mad(x, high = TRUE))) / 1.4826
## give

## lomad 10.5
## mad   11.5
## himad 12.5
##--------------o<------------------o<------------------o<-----------


With best regards,
Martin




    > This is confusing because when calculated manually the MAD is different:

This is not "the MAD" that we have defined; rather you use
different *center*, i.e. lo_median, median, or hi_median for the __center__ 

    >> # Manually
    >> median(abs(var-median(var)))*1.4826
    > [1] 0.88956
    >> median(abs(var-4.7))*1.4826 # LOW for median
    > [1] 1.03782
    >> median(abs(var-5.1))*1.4826 # HIGH for median
    > [1] 1.11195


    > Best,

    > John Buggeln, MS
    > PhD Candidate
    > University of Delaware
    > _______________________________________________
    > R-SIG-Robust using r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-sig-robust



More information about the R-SIG-Robust mailing list