R: Factor Analysis

factanal {stats}

R Documentation

Factor Analysis

Description

Perform maximum-likelihood factor analysis on a covariance matrix or data matrix.

Usage

factanal(x, factors, data = NULL, covmat = NULL, n.obs = NA,
         subset, na.action, start = NULL,
         scores = c("none", "regression", "Bartlett"),
         rotation = "varimax", control = NULL, ...)

Arguments

x

formula or a numeric matrix or an object that can be coerced to a numeric matrix.

factors

integer number of factors to be fitted.

data

an optional data frame (or similar: see model.frame), used only if x is a formula. By default the variables are taken from environment(formula).

covmat

covariance matrix or a covariance list as returned by cov.wt. Of course, correlation matrices are covariance matrices.

n.obs

the number of observations, used if covmat is a covariance matrix.

subset

a specification of the cases to be used, if x is used as a matrix or formula.

na.action

the na.action to be used if x is used as a formula, in which case model.frame() is called.

start

NULL or a matrix of starting values, each column giving an initial set of uniquenesses.

scores

Type of scores to produce, if any. The default is none, "regression" gives Thomson's scores, "Bartlett" given Bartlett's weighted least-squares scores. Partial matching allows these names to be abbreviated.

rotation

a function or character string. In the latter case, "none" or the name of a function to be used to rotate the factors: it will be called with first argument the loadings matrix, and should return a list with component loadings giving the rotated loadings, or just the rotated loadings.

control

a list of control values with components

nstart: The number of starting values to be tried if start = NULL. Default 1.
trace: logical. Output tracing information? Default FALSE.
lower: The lower bound for uniquenesses during optimization. Should be > 0. Default 0.005.
opt: a list of control values to be passed to optim's control argument.
rotate: a list of additional arguments for the rotation function.

...

components of control can also be supplied directly as named arguments to factanal.

Details

The factor analysis model is

x = \Lambda f + e

for a p–element vector x, a p \times k matrix \Lambda of loadings, a k–element vector f of scores and a p–element vector e of errors. None of the components other than x is observed, but the major restriction is that the scores be uncorrelated and of unit variance, and that the errors be independent with variances \Psi, the uniquenesses. It is also common to scale the observed variables to unit variance, and done in this function.

Thus factor analysis is in essence a model for the correlation matrix of x,

\Sigma = \Lambda\Lambda^\prime + \Psi

There is still some indeterminacy in the model for it is unchanged if \Lambda is replaced by G \Lambda for any orthogonal matrix G. Such matrices G are known as rotations (although the term is applied also to non-orthogonal invertible matrices).

If covmat is supplied it is used. Otherwise x is used if it is a matrix, or a formula x is used with data to construct a model matrix, and that is used to construct a covariance matrix. (It makes no sense for the formula to have a response, and all the variables must be numeric.) Once a covariance matrix is found or calculated from x, it is converted to a correlation matrix for analysis. The correlation matrix is returned as component correlation of the result.

The fit is done by optimizing the log likelihood assuming multivariate normality over the uniquenesses. (The maximizing loadings for given uniquenesses can be found analytically: ⁠Lawley and Maxwell (1971, page 27).) All the starting values supplied in start are tried in turn and the best fit obtained is used. If start = NULL then the first fit is started at the value suggested by ⁠Jöreskog (1963) and given by ⁠Lawley and Maxwell (1971, page 31)), and then control$nstart - 1 other values are tried, randomly selected as equal values of the uniquenesses.

The uniquenesses are technically constrained to lie in [0, 1], but near-zero values are problematical, and the optimization is done with a lower bound of control$lower, default 0.005 (Lawley and Maxwell 1971, page 32).

Scores can only be produced if a data matrix is supplied and used. The first method is the regression method of ⁠Thomson (1951), the second the weighted least squares method of ⁠Bartlett (1937); ⁠Thomson (1938). Both are estimates of the unobserved scores f. Thomson's method regresses (in the population) the unknown f on x to yield

\hat f = \Lambda^\prime \Sigma^{-1} x

and then substitutes the sample estimates of the quantities on the right-hand side. Bartlett's method minimizes the sum of squares of standardized errors over the choice of f, given (the fitted) \Lambda.

If x is a formula then the standard NA-handling is applied to the scores (if requested): see napredict.

The print method (documented under loadings) follows the factor analysis convention of drawing attention to the patterns of the results, so the default precision is three decimal places, and small loadings are suppressed.

Value

An object of class "factanal" with components

loadings

A matrix of loadings, one column for each factor. The factors are ordered in decreasing order of sums of squares of loadings, and given the sign that will make the sum of the loadings positive. For correlated factors, the correlation matrix is stored as an attribute labeled "covariance". This is of class "loadings": see loadings for its print method.

uniquenesses

The uniquenesses computed.

correlation

The correlation matrix used.

criteria

The results of the optimization: the value of the criterion (a linear function of the negative log-likelihood) and information on the iterations used.

factors

The argument factors.

dof

The number of degrees of freedom of the factor analysis model.

method

The method: always "mle".

rotmat

The rotation matrix if relevant.

scores

If requested, a matrix of scores. napredict is applied to handle the treatment of values omitted by the na.action.

n.obs

The number of observations if available, or NA.

call

The matched call.

na.action

If relevant.

STATISTIC, PVAL

The significance-test statistic and P value, if it can be computed.

Note

There are so many variations on factor analysis that it is hard to compare output from different programs. Further, the optimization in maximum likelihood factor analysis is hard, and many other examples we compared had less good fits than produced by this function. In particular, solutions which are ‘Heywood cases’ (with one or more uniquenesses essentially zero) are much more common than most texts and some other programs would lead one to believe.

References

⁠Bartlett M. S. (1937). “The Statistical Conception of Mental Factors.” British Journal of Psychology. General Section, 28(1), 97–104. doi:10.1111/j.2044-8295.1937.tb00863.x.

⁠Jöreskog K. G. (1963). Statistical Estimation in Factor Analysis. Almqvist and Wicksell.

⁠Lawley D. N., Maxwell A. E. (1971). Factor Analysis as a Statistical Method, 2nd edition. Butterworth & Co Publishers Ltd. ISBN 978-0408701525.

⁠Thomson G. H. (1951). The Factorial Analysis of Human Ability. University of London Press, London.

⁠Thomson G. H. (1938). “Methods of Estimating Mental Factors.” Nature, 141(3562), 246–246. doi:10.1038/141246a0.

Examples

# A little demonstration, v2 is just v1 with noise,
# and same for v4 vs. v3 and v6 vs. v5
# Last four cases are there to add noise
# and introduce a positive manifold (g factor)
v1 <- c(1,1,1,1,1,1,1,1,1,1,3,3,3,3,3,4,5,6)
v2 <- c(1,2,1,1,1,1,2,1,2,1,3,4,3,3,3,4,6,5)
v3 <- c(3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,5,4,6)
v4 <- c(3,3,4,3,3,1,1,2,1,1,1,1,2,1,1,5,6,4)
v5 <- c(1,1,1,1,1,3,3,3,3,3,1,1,1,1,1,6,4,5)
v6 <- c(1,1,1,2,1,3,3,3,4,3,1,1,1,2,1,6,5,4)
m1 <- cbind(v1,v2,v3,v4,v5,v6)
cor(m1)
factanal(m1, factors = 3) # varimax is the default
factanal(m1, factors = 3, rotation = "promax")
# The following shows the g factor as PC1
prcomp(m1) # signs may depend on platform

## formula interface
factanal(~v1+v2+v3+v4+v5+v6, factors = 3,
         scores = "Bartlett")$scores

## a realistic example from Bartholomew (1987, pp. 61-65)
utils::example(ability.cov)

[Package stats version 4.6.0 Index]