R computing environment R Documentation

## Package sn: overview of the package structure and commands

### Description

The package provides facilities to build and manipulate probability distributions of the skew-normal and some related families, notably the skew-t family, and makes available related statistical methods for data fitting and model diagnostics, in the univariate and the multivariate case.

The package comprises two main sides; one side provides facilities for the pertaining probability distributions, the other one makes available statistical methods dealing with these distributions.

Underlying formulation, parameterizations of distributions and terminology are in agreement with the monograph of Azzalini and Capitanio (2014).

The present document refers to version 1.5-0 of the package (2017-02-09).

### Probability side

There are two layers of support for the probability distributions of interest. At the basic level, there exist functions which follow the classical R scheme for distributions. In addition, there exists facilities to build an object which incapsulates a probability distribution and then certain operations can be be performed on such an object. These two schemes are described next.

Classical R scheme

The following functions work similary to `{d,p,q,r}norm` and other R functions for probability distributions:

• skew-normal (SN): functions `{d,p,q,r}sn` for the univariate case, functions `{d,p,r}msn` for the multivariate case, where in both cases the ‘Extended skew-normal’ (ESN) variant form is included;

• skew-t (ST): functions `{d,p,q,r}st` for the univariate case, functions `{d,p,r}mst` for the multivariate case;

• skew-Cauchy (SC): functions `{d,p,q,r}sc` for the univariate case, functions `{d,p,r}msc` for the multivariate case.

In addition to the usual specification of their parameters as a sequence of individual components, a parameter set can be specified as a single `dp` entity, namely a vector in the univariate case, a list in the multivariate case; `dp` stands for ‘Direct Parameters’ (DP).

Conversion from the `dp` parameter set to the corresponding Centred Parameters (CP) can be accomplished using the function `dp2cp`, while function `cp2dp` performs the inverse transformation.

In addition, one can introduce a user-specified density function using `dSymmModulated` and `dmSymmModulated`, in the univariate and the multivariate case, respectively. These densities are of the ‘symmetry-modulated’ type, also called ‘skew-symmetric’, where one can specify the base density and the modulation factor with high degree of flexibility. Random numbers can be sampled using the corresponding functions `rSymmModulated` and `rmSymmModulated`. In the bivariate case, a dedicated plotting function exists.

SEC distribution objects

Function `makeSECdistr` can be used to build a ‘SEC distribution’ object representing a member of a specified parametric family (among the types SN, ESN, ST, SC) with a given `dp` parameter set. This object can be used for various operations such as plotting or extraction of moments and other summary quantities. Another way of constructing a SEC distribution object is via `extractSECdistr` which extracts suitable components of an object produced by function `selm` to be described below.

Additional operations on these objects are possible in the multivariate case, namely `marginalSECdistr` for marginalization and `marginalSECdistr` for affine trasformations. For the multivariate SN family only, `marginalSECdistr` performs a conditioning on the values taken on by some components of the multivariate variable.

### Statistics side

The main function for data fitting is represented by `selm`, which allows to specify a linear regression model for the location parameter, similarly to function `lm`, but assuming a skew-elliptical distribution; this explains the name selm=(se+lm). Allowed types of distributions are SN (but not ESN), ST and SC. The fitted distribution is univariate or multivariate, depending on the nature of the response variable of the posited regression model. The model fitting method is either maximum likelihood or maximum penalized likelihood; the latter option effectively allows the introduction of a prior distribution on the slant parameter of the error distribution, hence leading to a ‘maximum a posteriori’ estimate.

Once the fitting process has been accomplished, an object of class either selm (for univariate response) or mselm (for multivariate response) is produced. A number of ‘methods’ are available for these objects: `show`, `plot`, `summary`, `coef`, `residuals`, `logLik` and others. For univariate selm-class objects, univariate and bivariate profile log-likelihood functions can be obtained; a `predict` method also exists. These methods are built following the S4 protocol; however, the user must not be concerned with the choice of the adopted protocol (unless this is wished).

The actual fitting process invoked via `selm` is actually performed by a set of lower-level procedures. These are accessible for direct call, if so wished, typically for improved efficiency, at the expense of a little additional programming effort. Similarly, functions to compute the Fisher information matrix are available, in the expected and the observed form (with some restrictions depending on the selected distribution).

The `extractSECdistr` function extracts the fitted SEC distribution from selm-class and mselm-class objects, hence providing a bridge with the probability side of the package.

### Author

Adelchi Azzalini. Please send comments, error reports et cetera to the author, whose web page is http://azzalini.stat.unipd.it/.

### References

Azzalini, A. with the collaboration of Capitanio, A. (2014). The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.