[Rd] Suggestions for 'diff.default'

Suharto Anggono Suharto Anggono suharto_anggono at yahoo.com
Tue Jan 29 04:32:33 CET 2013



--- On Mon, 28/1/13, Suharto Anggono Suharto Anggono <suharto_anggono at yahoo.com> wrote:

> From: Suharto Anggono Suharto Anggono <suharto_anggono at yahoo.com>
> Subject: Suggestions for 'diff.default'
> To: R-devel at lists.R-project.org
> Date: Monday, 28 January, 2013, 5:31 PM
> I have suggestions for function
> 'diff.default' in R.
> 
> 
> Suggestion 1: If the input is matrix, always return matrix,
> even if empty.
> 
> What happens in R 2.15.2:
> 
> > rbind(1:2)    # matrix
>      [,1] [,2]
> [1,]    1    2
> > diff(rbind(1:2))   # not matrix
> integer(0)
> > sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: i386-w64-mingw32/i386 (32-bit)
> 
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices
> utils     datasets 
> methods   base
> 
> 
> The documentation for 'diff' says, "If 'x' is a matrix then
> the difference operations are carried out on each column
> separately."
> If the result is empty, I expect that the result still has
> as many columns as the input.
> 
> 
> Suggestion 2: Make 'diff.default' applicable more generally
> by
> (a) not performing 'unclass';
> (b) generalizing (changing)
> ismat <- is.matrix(x)
> to become
> ismat <- length(dim(x)) == 2L
> 
> 
> If suggestion 1 is to be applied, if 'unclass' is not wanted
> (point (a) in suggestion 2 is also to be applied),
> 
>     if (lag * differences >= xlen)
>     return(x[0L])
> 
> can be changed to
> 
>     if (lag * differences >= xlen)
>     return(
>             if (ismat) x[0L, ,
> drop = FALSE] - x[0L, , drop = FALSE] else
>             x[0L] - x[0L])
> 
> It will handle class where subtraction (minus) operation
> change class.
Sorry, I wasn't careful enough. To obtain the correct class for the result, differencing should be done as many times as specified by argument 'differences'.

I consider the case of
diff(as.POSIXct(c("2012-01-01", "2012-02-01"), tz="UTC"), d=2)
versus
diff(diff(as.POSIXct(c("2012-01-01", "2012-02-01"), tz="UTC")))
To be safe, maybe just compute as usual, even when it is known that the end result will be empty. It can be done like this.

    empty <- integer()
    if (ismat)
	for (i in seq_len(differences))
	    r <- if (lag >= nrow(r))
                r[empty, , drop = FALSE] - r[empty, , drop = FALSE] else
                ...
    else
        for (i in seq_len(differences))
            r <- if (lag >= length(r))
                r[empty] - r[empty] else
                ...

If that way is used, 'xlen' is no longer needed.
> 
> Otherwise, if 'unclass' is wanted, maybe the handling of
> empty result can be moved to be after 'unclass', to be
> consistent with non-empty result.
> 
> 
> If point (a) in suggestion 2 is applied, 'diff.default' can
> handle input of class "Date" and "POSIXt". If, in addition,
> point (b) in suggestion 2 is also applied, 'diff.default'
> can handle data frame as input.
>



More information about the R-devel mailing list