Type: | Package |
Title: | Probabilistic Latent Variable Models for Metabolomic Data |
Version: | 1.3.1 |
Date: | 2010-05-12 |
Author: | Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan. |
Maintainer: | Claire Gormley <claire.gormley@ucd.ie> |
Description: | Fits probabilistic principal components analysis, probabilistic principal components and covariates analysis and mixtures of probabilistic principal components models to metabolomic spectral data. |
Depends: | mclust, mvtnorm, ellipse, gtools, gplots |
License: | GPL-2 |
LazyLoad: | yes |
Packaged: | 2019-08-31 10:22:13 UTC; hornik |
Repository: | CRAN |
Date/Publication: | 2019-08-31 10:24:07 UTC |
NeedsCompilation: | no |
Probabilistic latent variable models for metabolomic data.
Description
Fits probabilistic principal components analysis (PPCA), probabilistic principal components and covariates analysis (PPCCA) and mixtures of probabilistic principal component analysis (MPPCA) models to metabolomic spectral data. Estimates of the uncertainty associated with the model parameter estimates are provided.
Details
Package: | MetabolAnalyze |
Type: | Package |
Version: | 1.0 |
Date: | 2010-05-12 |
License: | GPL-2 |
LazyLoad: | yes |
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan.
Claire Gormley <claire.gormley@ucd.ie>
References
Nyamundanda, G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical Report. University College Dublin.
Assess convergence of an EM algorithm.
Description
This function assesses convergence of the EM algorithm using Aitken's acceleration method, when fitting a PPCA based model.
Usage
Aitken(ll, lla, v, q, epsilon)
Arguments
ll |
A vector of log likelihoods from the current and previous iterations. |
lla |
A vector containing the asympototic estimates of the maximized log likelihoods from the current and previous iterations. |
v |
Iteration number. |
q |
The dimension of the latent principal subspace for the PPCA based model currently being fitted. |
epsilon |
The value on which convergence of the EM algorithm is based. |
Details
This function assesses convergence of the EM algorithm using Aitken's acceleration method in which an estimate of the maximized log likelihood at each iteration is evaluated. Convergence is achieved when the absolute difference between contiguous estimates, tol, is less than some user defined level, epsilon.
Value
A list containing:
tol |
The absolute difference between contiguous estimates of the asymptotic maximized log likelihood. |
la |
The asymptotic estimate of the maximized log likelihood at the current iteration. |
Note
This is used internally in functions which fit PPCA based models via the EM algorithm within the package MetabolAnalyze.
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan
References
McLachlan, G.J. and Krishnan, T. (1997) The EM algorithm and Extensions. Wiley, New York.
See Also
ppca.metabol
, ppcca.metabol
, mppca.metabol
NMR spectral data from brain tissue samples.
Description
NMR spectral data from brain tissue samples of 33 rats, where each tissue sample originates in one of four known brain regions. Each spectrum has 164 spectral bins, measured in parts per million (ppm).
Usage
data(BrainSpectra)
Format
A list containing
a matrix with 33 rows and 164 columns
a vector indicating the brain region of origin of each sample where:
1 = Brain stem
2 = Cerebellum
3 = Hippocampus
4 = Pre-frontal cortex
Details
This is simulated data, based on parameter estimates from a mixture of PPCA models with 4 groups and 7 principal components fitted to a similar real data set.
Source
Nyamundanda, G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report. University College Dublin, Ireland.
NMR metabolomic spectra from urine samples of 18 mice.
Description
NMR metabolomic spectra from urine samples of 18 mice, each belonging to one of two treatment groups. Each spectrum has 189 spectral bins, measured in parts per million (ppm).
Covariates associated with the mice were also recorded: the weight of each mouse is provided.
Usage
data(UrineSpectra)
Format
A list containing
a matrix with 18 rows and 189 columns
a data frame with 18 observations on 2 variables:
Treatment group membership of each animal.
Weight (in grammes) of each animal.
Details
This is simulated data, based on parameter estimates from a PPCA model with two prinicipal components fitted to a similar real data set.
Source
Nyamundanda, G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report. University College Dublin, Ireland.
First E step of the AECM algorithm when fitting a mixture of PPCA models.
Description
Internal function required for fitting a mixture of PPCA models.
Usage
estep1(Y, Tau, Pi, mu, W, Sig, g, p, reset)
Arguments
Y |
A N x p data matrix. |
Tau |
A N x G matrix of posterior group membership probabilities. |
Pi |
A G vector of mixing proportions. |
mu |
A p x G matrix containing the mean for each group. |
W |
An p x q x G array of loadings for each group. |
Sig |
A scalar; the error covariance. |
g |
The number of groups currently being fitted. |
p |
Number of spectral bins in the NMR spectra. |
reset |
Logical indicating computational instability. |
Details
First E step of the AECM algorithm when fitting a mixture of PPCA models. An internal function.
Value
A list containing
Tau |
The N x G matrix of posterior group membership probablities. |
logTau |
An N x G matrix of the log of the numerator of posterior group membership probablities. |
reset |
Logical indicating computational instability. |
Note
An internal function.
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan
References
Nyamundanda, G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report. University College Dublin, Ireland.
See Also
Second E step of the AECM algorithm when fitting a mixture of PPCA models.
Description
Internal function required for fitting a mixture of PPCA models.
Usage
estep2(Y, Tau, Pi, mu, W, Sig, g, p, reset)
Arguments
Y |
A N x p data matrix. |
Tau |
An N x g matrix of posterior group membership probabilities. |
Pi |
A g vector of group probabilities. |
mu |
A p x g matrix containing the mean for each group. |
W |
An p x q x g array of loadings for each group. |
Sig |
A scalar; the error covariance. |
g |
The number of groups currently being fitted. |
p |
Number of spectral bins in the NMR spectra. |
reset |
Logical indicating computational instability. |
Details
Second E step of the AECM algorithm when fitting a mixture of PPCA models. An internal function.
Value
A list containing
Tau |
The N x G matrix of posterior group membership probablities. |
logTau |
An N x G matrix of the log of the numerator of posterior group membership probablities. |
reset |
Logical indicating computational instability. |
Note
An internal function.
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan
References
Nyamundanda, G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report. University College Dublin, Ireland.
See Also
Function to plot a heatmap of BIC values.
Description
Function to plot a heat map of BIC values where lighter colours indicate larger values and optimal models. A black cross indicates the optimal model.
The function is a modified version of heatmap
.
Usage
ht(x, Rowv = NULL, Colv = if (symm) "Rowv" else NULL, distfun = dist,
hclustfun = hclust, reorderfun = function(d, w) reorder(d, w),
add.expr, symm = FALSE, revC = identical(Colv, "Rowv"),
scale = c("row", "column", "none"), na.rm = FALSE, margins = c(5, 5),
ColSideColors, RowSideColors, cexRow = 1, cexCol = 1, labRow = NULL,
labCol = NULL, main = NULL, xlab = NULL, ylab = NULL,
keep.dendro = FALSE, verbose = getOption("verbose"), q, g)
Arguments
See the help file for heatmap
.
Details
This function is used internally in mppca.metabol
.
Value
See the help file for heatmap
.
Note
An internal function.
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan
References
Nyamundanda, G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report. University College Dublin, Ireland.
See Also
Plot loadings and their associated confidence intervals.
Description
A function to plot the loadings and confidence intervals resulting from fitting a PPCA model or a PPCCA model to metabolomic data.
Usage
loadings.jack.plot(output)
Arguments
output |
An object resulting from fitting a PPCA model or a PPCCA model. |
Details
The function produces a plot of those loadings on the first principal component which are significantly different from zero, and higher than a user specified cutoff point. Error bars associated with the estimates, derived using the jackknife, are also plotted.
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan
References
Nyamundanda, G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report. University College Dublin, Ireland.
See Also
ppca.metabol.jack
, ppcca.metabol.jack
Plot loadings.
Description
A function to plot the loadings resulting from fitting a PPCA model or a PPCCA model to metabolomic data. A barplot or a scatterplot can be produced.
Usage
loadings.plot(output, barplot = FALSE, labelsize = 0.3)
Arguments
output |
An object resulting from fitting a PPCA model or a PPCCA model. |
barplot |
Logical indicating whether a barplot of the loadings is required rather than a scatter plot. By default a scatter plot is produced. |
labelsize |
Size of the text of the spectral bin labels on the resulting plot. |
Details
A function to plot the loadings resulting from fitting a PPCA model or a PPCCA model to metabolomic data. A barplot or a scatterplot can be produced. The size of the text of the spectral bin labels on the bar plot can also be adjusted if the number of bins plotted is large.
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan
References
Nyamundanda, G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report. University College Dublin, Ireland.
See Also
Plot loadings resulting from fitting a MPPCA model.
Description
A function to plot the loadings resulting from fitting a MPPCA model to metabolomic data. A barplot or a scatterplot can be produced.
Usage
mppca.loadings.plot(output, Y, barplot = FALSE, labelsize = 0.3)
Arguments
output |
An object resulting from fitting a MPPCA model. |
Y |
The N x p matrix of observations to which the MPPCA model is fitted. |
barplot |
Logical indicating whether a barplot of the loadings is required rather than a scatter plot. By default a scatter plot is produced. |
labelsize |
Size of the text of the spectral bin labels on the resulting plot. |
Details
A function which produces a series of plots illustrating the loadings resulting from fitting a MPPCA model to metabolomic data. A barplot or a scatterplot can be produced. The size of the text of the spectral bin labels on the bar plot can also be adjusted if the number of bins plotted is large.
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan
References
Nyamundanda, G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report. University College Dublin, Ireland.
See Also
Fit a mixture of probabilistic principal components analysis (MPPCA) model to a metabolomic data set via the EM algorithm to perform simultaneous dimension reduction and clustering.
Description
This function fits a mixture of probabilistic principal components analysis model to metabolomic spectral data via the EM algorithm.
Usage
mppca.metabol(Y, minq=1, maxq=2, ming, maxg, scale = "none",
epsilon = 0.1, plot.BIC = FALSE)
Arguments
Y |
An N x p data matrix where each row is a spectrum. |
minq |
The minimum number of principal components to be fit. By default minq is 1. |
maxq |
The maximum number of principal components to be fit. By default maxq is 2. |
ming |
The minimum number of groups to be fit. |
maxg |
The maximum number of groups to be fit. |
scale |
Type of scaling of the data which is required. The default is "none". Options include "pareto' and "unit" scaling. See |
epsilon |
Value on which the convergence assessment criterion is based. Set by default to 0.1. |
plot.BIC |
Logical indicating whether or not a plot of the BIC values for the different models fitted should be provided. By default, the plot is not produced. |
Details
This function fits a mixture of probabilistic principal components analysis models to metabolomic spectral data via the EM algorithm. A range of models with different numbers of groups and different numbers of principal components can be fitted. The model performs simultaneous clustering of observations into unknown groups and dimension reduction simultaneously.
Value
A list containing:
q |
The number of principal components in the optimal MPPCA model, selected by the BIC. |
g |
The number of groups in the optimal MPPCA model, selected by the BIC. |
sig |
The posterior mode estimate of the variance of the error terms. |
scores |
A list of length g, each entry of which is a n_g x q matrix of estimates of the latent locations of each observation in group g in the principal subspace. |
loadings |
An array of dimension p x q x g, each sheet of which contains the maximum likelihood estimate of the p x q loadings matrix for a group. |
Pi |
The vector indicating the probability of belonging to each group. |
mean |
A p x g matrix, each column of which contains a group mean. |
tau |
An N x g matrix, each row of which contains the posterior group membership probabilities for an observation. |
clustering |
A vector of length N indicating the group to which each observation belongs. |
BIC |
A matrix containing the BIC values for the fitted models. |
AIC |
A matrix containing the AIC values for the fitted models. |
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan.
References
Nyamundanda G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report, University College Dublin.
See Also
mppca.scores.plot
, mppca.loadings.plot
Examples
data(BrainSpectra)
## Not run:
mdlfit<-mppca.metabol(BrainSpectra[[1]], minq=7, maxq=7, ming=4, maxg=4,
plot.BIC = TRUE)
mppca.scores.plot(mdlfit)
mppca.loadings.plot(mdlfit, BrainSpectra[[1]])
## End(Not run)
Plot scores from a fitted MPPCA model
Description
A function to plot the scores resulting from fitting a MPPCA model to metabolomic data.
Usage
mppca.scores.plot(output, group = FALSE, gplegend = TRUE)
Arguments
output |
An object resulting from fitting a MPPCA model. |
group |
Should it be relevant, a vector indicating the known treatment group membership of each observation prior to clustering. |
gplegend |
Logical indicating whether a legend should be plotted. |
Details
This function produces a series of scatterplots, for each group uncovered. For group g, each scatterplot illustrates the estimated score for each observation allocated to that group within the reduced q dimensional space. The uncertainty associated with the score estimate is also illustrated through its 95
It is often the case that observations are known to belong to treatment groups, for example, and the MPPCA model is employed to uncover any underlying subgroups, possibly related to disease subtypes. The treatment group membership of each observation can be illustrated on the plots produced by utilizing the ‘group’ argument.
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan
References
Nyamundanda, G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report. University College Dublin, Ireland.
See Also
First M-step of the AECM algorithm when fitting a mixture of PPCA models.
Description
Internal function required for fitting a mixture of PPCA models.
Usage
mstep1(Y, Tau, Pi, mu, g)
Arguments
Y |
A N x p data matrix. |
Tau |
An N x G matrix of posterior group membership probabilities. |
Pi |
A g vector of group probabilities. |
mu |
A p x g matrix containing the mean for each group. |
g |
The number of groups currently being fitted. |
Details
First M-step of the AECM algorithm when fitting a mixture of PPCA models. An internal function.
Value
A list containing
Pi |
A g vector of group probabilities |
Mu |
A p x g matrix each column of which contains a group mean. |
Note
An internal function.
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan
References
Nyamundanda, G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report. University College Dublin, Ireland.
See Also
Second M-step of the AECM algorithm when fitting a mixture of PPCA models.
Description
Internal function required for fitting a mixture of PPCA models.
Usage
mstep2(Y, Tau, Pi, mu, W, Sig, g, p, q)
Arguments
Y |
A N x p data matrix. |
Tau |
An N x G matrix of posterior group membership probabilities. |
Pi |
A g vector of group probabilities. |
mu |
A p x g matrix containing the mean for each group. |
W |
A p x q x g array, each sheet of which contains a group specific loadings matrix. |
Sig |
The variance parmeter. |
g |
The number of groups currently being fitted. |
p |
The number of spectral bins in the NMR spectrum. |
q |
The number of principal components in the model being fitted. |
Details
Second M-step of the AECM algorithm when fitting a mixture of PPCA models. An internal function.
Value
A list containing
W |
A p x q x g array, each sheet of which contains a group specific loadings matrix. |
Sig |
The variance parameter. |
Note
An internal function.
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan
References
Nyamundanda, G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report. University College Dublin, Ireland.
See Also
Fit a probabilistic principal components analysis (PPCA) model to a metabolomic data set via the EM algorithm.
Description
This function fits a probabilistic principal components analysis model to metabolomic spectral data via the EM algorithm.
Usage
ppca.metabol(Y, minq=1, maxq=2, scale = "none", epsilon = 0.1,
plot.BIC = FALSE, printout=TRUE)
Arguments
Y |
An N x p data matrix where each row is a spectrum. |
minq |
The minimum number of principal components to be fit. By default minq is 1. |
maxq |
The maximum number of principal components to be fit. By default maxq is 2. |
scale |
Type of scaling of the data which is required. The default is "none". Options include "pareto' and "unit" scaling. See |
epsilon |
Value on which the convergence assessment criterion is based. Set by default to 0.1. |
plot.BIC |
Logical indicating whether or not a plot of the BIC values for the different models fitted should be provided. By default, the plot is not produced. |
printout |
Logical indicating whether or not a statement is printed on screen detailing the progress of the algorithm. |
Details
This function fits a probabilistic principal components analysis model to metabolomic spectral data via the EM algorithm. A range of models with different numbers of principal components can be fitted.
Value
A list containing:
q |
The number of principal components in the optimal PPCA model, selected by the BIC. |
sig |
The posterior mode estimate of the variance of the error terms. |
scores |
An N x q matrix of estimates of the latent locations of each observation in the principal subspace. |
loadings |
The maximum likelihood estimate of the p x q loadings matrix. |
BIC |
A vector containing the BIC values for the fitted models. |
AIC |
A vector containing the AIC values for the fitted models. |
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan.
References
Nyamundanda G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report, University College Dublin.
See Also
ppca.metabol.jack
, loadings.plot
, ppca.scores.plot
Examples
data(UrineSpectra)
## Not run:
mdlfit<-ppca.metabol(UrineSpectra[[1]], minq=2, maxq=2, scale="none")
loadings.plot(mdlfit)
ppca.scores.plot(mdlfit, group=UrineSpectra[[2]][,1])
## End(Not run)
Fit a probabilistic principal components analysis model to a metabolomic data set, and assess uncertainty via the jackknife.
Description
Fit a probabilistic principal components analysis (PPCA) model to a metabolomic data set via the EM algorithm, and assess uncertainty in the obtained loadings estimates via the jackknife.
Usage
ppca.metabol.jack(Y, minq=1, maxq=2, scale ="none",
epsilon = 0.1, conflevel = 0.95)
Arguments
Y |
An N x p data matrix where each row is a spectrum. |
minq |
The minimum number of principal components to be fit. By default minq is 1. |
maxq |
The maximum number of principal components to be fit. By default maxq is 2. |
scale |
Type of scaling of the data which is required. The default is "none". Options include "pareto' and "unit" scaling. See |
epsilon |
Value on which the convergence assessment criterion is based. Set by default to 0.1. |
conflevel |
Level of confidence required for the loadings confidence intervals. By default 95 |
Details
A (range of) PPCA model(s) are fitted and an optimal model (i.e. number of principal components, q) is selected. Confidence intervals for the obtained loadings are then obtained via the jackknife i.e. a model with q principal components is fitted to the dataset N
times, where an observation is removed from the dataset each time.
On convergence of the algorithm, the number of loadings significantly different from zero is printed on screen. The user may then further examine the significant loadings when prompted by selecting a cutoff value from the table printed on screen. Bar plots detailing the resulting significantly high loadings are provided.
Value
A list containing:
q |
The number of principal components in the optimal PPCA model, selected by the BIC. |
sig |
The posterior mode estimate of the variance of the error terms. |
scores |
An N x q matrix of estimates of the latent locations of each observation in the principal subspace. |
loadings |
The maximum likelihood estimate of the p x q loadings matrix. |
SignifW |
The maximum likelihood estimate of the loadings matrix for those loadings significantly different from zero. |
SignifHighW |
The maximum likelihood estimate of the loadings matrix for those loadings significantly different from zero and higher than a user selected cutoff point. |
Lower |
The lower limit of the confidence interval for those loadings significantly different from zero. |
Upper |
The upper limit of the confidence interval for those loadings significantly different from zero. |
Cutoffs |
A table detailing a range of cutoff points and the associated number of selected spectral bins. |
number |
The number of spectral bins selected by the user. |
cutoff |
The cutoff value selected by the user. |
BIC |
A vector containing the BIC values for the fitted models. |
AIC |
A vector containing the AIC values for the fitted models. |
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan.
References
Nyamundanda G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report, University College Dublin.
See Also
ppca.metabol
, loadings.jack.plot
, ppca.scores.plot
Examples
data(UrineSpectra)
## Not run:
mdlfit<-ppca.metabol.jack(UrineSpectra[[1]], minq=2, maxq=2, scale="none")
loadings.jack.plot(mdlfit)
ppca.scores.plot(mdlfit, group=UrineSpectra[[2]][,1])
## End(Not run)
Plot scores from a fitted PPCA model
Description
A function to plot the scores resulting from fitting a PPCA model to metabolomic data.
Usage
ppca.scores.plot(output, group = FALSE)
Arguments
output |
An object resulting from fitting a PPCA model. |
group |
Should it be relevant, a vector indicating the known treatment group membership of each observation. |
Details
This function produces a series of scatterplots each illustrating the estimated score for each observation within the reduced q dimensional space. The uncertainty associated with the score estimate is also illustrated through its 95
It is often the case that observations are known to belong to treatment groups; the treatment group membership of each observation can be illustrated on the plots produced by utilizing the ‘group’ argument.
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan
References
Nyamundanda, G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report. University College Dublin, Ireland.
See Also
ppca.metabol
, ppca.metabol.jack
Fit a probabilistic principal components and covariates analysis (PPCCA) model to a metabolomic data set via the EM algorithm.
Description
This function fits a probabilistic principal components and covariates analysis model to metabolomic spectral data via the EM algorithm.
Usage
ppcca.metabol(Y, Covars, minq=1, maxq=2, scale = "none", epsilon = 0.1,
plot.BIC = FALSE, printout=TRUE)
Arguments
Y |
An N x p data matrix in which each row is a spectrum. |
Covars |
An N x L covariate data matrix in which each row is a set of covariates. |
minq |
The minimum number of principal components to be fit. |
maxq |
The maximum number of principal components to be fit. |
scale |
Type of scaling of the data which is required. The default is "none". Options include "pareto' and "unit" scaling. See |
epsilon |
Value on which the convergence assessment criterion is based. Set by default to 0.1. |
plot.BIC |
Logical indicating whether or not a plot of the BIC values for the different models fitted should be provided. By default, the plot is not produced. |
printout |
Logical indicating whether or not a statement is printed on screen detailing the progress of the algorithm. |
Details
This function fits a probabilistic principal components and covariates analysis model to metabolomic spectral data via the EM algorithm. A range of models with different numbers of principal components can be fitted.
Care should be taken with the form of covariates supplied. All covariates are standardized (to lie in [0,1]) within the ppcca.metabol function for stability reasons. Hence continuous covariates and binary valued categorical covariates are easily handled. For a categorical covariate with V levels, the equivalent V-1 dummy variables representation should be passed as an argument to ppcca.metabol.
Value
A list containing:
q |
The number of principal components in the optimal PPCCA model, selected by the BIC. |
sig |
The posterior mode estimate of the variance of the error terms. |
scores |
An N x q matrix of estimates of the latent locations of each observation in the principal subspace. |
loadings |
The maximum likelihood estimate of the p x q loadings matrix. |
coefficients |
The maximum likelihood estimates of the regression coefficients associated with the covariates in the PPCCA model. |
BIC |
A vector containing the BIC values for the fitted models. |
AIC |
A vector containing the AIC values for the fitted models. |
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan.
References
Nyamundanda G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report, University College Dublin.
See Also
ppcca.metabol.jack
, ppcca.scores.plot
loadings.plot
Examples
data(UrineSpectra)
## Not run:
mdlfit<-ppcca.metabol(UrineSpectra[[1]], UrineSpectra[[2]][,2], minq=2, maxq=2)
loadings.plot(mdlfit)
ppcca.scores.plot(mdlfit, UrineSpectra[[2]][,2], group=UrineSpectra[[2]][,1], covarnames="Weight")
## End(Not run)
Fit a probabilistic principal components and covariates analysis model to a metabolomic data set, and assess uncertainty via the jackknife.
Description
Fit a probabilistic principal components and covariates analysis (PPCCA) model to a metabolomic data set via the EM algorithm, and assess uncertainty in the obtained loadings estimates and the regression coefficients via the jackknife.
Usage
ppcca.metabol.jack(Y, Covars, minq=1, maxq=2, scale="none", epsilon=0.1,
conflevel=0.95)
Arguments
Y |
An N x p data matrix in which each row is a spectrum. |
Covars |
An N x L covariate data matrix where each row is a set of covariates. |
minq |
The minimum number of principal components to be fit. By default minq is 1. |
maxq |
The maximum number of principal components to be fit. By default maxq is 2. |
scale |
Type of scaling of the data which is required. The default is "none". Options include "pareto' and "unit" scaling. See |
epsilon |
Value on which the convergence assessment criterion is based. Set by default to 0.1. |
conflevel |
Level of confidence required for the loadings and regression coefficients confidence intervals. By default 95 |
Details
A (range of) PPCCA model(s) are fitted and an optimal model (i.e. number of principal components, q) is selected. Confidence intervals for the obtained loadings and regression coefficients are then obtained via the jackknife i.e. a model with q principal components is fitted to the data N
times, where an observation is removed from the dataset each time.
Care should be taken with the form of covariates supplied. All covariates are standardized (to lie in [0,1]) within the ppcca.metabol.jack function for stability reasons. Hence continuous covariates and binary valued categorical covariates are easily handled. For a categorical covariate with V levels, the equivalent V-1 dummy variables representation should be passed as an argument to ppcca.metabol.jack.
On convergence of the algorithm, the number of loadings significantly different from zero is printed on screen. The user may then further examine the significant loadings when prompted by selecting a cutoff value from the table printed on screen. Bar plots detailing the resulting significantly high loadings are provided.
Value
A list containing:
q |
The number of principal components in the optimal PPCCA model, selected by the BIC. |
sig |
The posterior mode estimate of the variance of the error terms. |
scores |
An N x q matrix of estimates of the latent locations of each observation in the principal subspace. |
loadings |
The maximum likelihood estimate of the p x q loadings matrix. |
SignifW |
The maximum likelihood estimate of the loadings matrix for those loadings significantly different from zero. |
SignifHighW |
The maximum likelihood estimate of the loadings matrix for those loadings significantly different from zero and above the user selected cutoff point. |
LowerCI_W |
The lower limit of the confidence interval for those loadings significantly different from zero. |
UpperCI_W |
The upper limit of the confidence interval for those loadings significantly different from zero. |
coefficients |
The maximum likelihood estimates of the regression coefficients. |
coeffCI |
A matrix detailing the upper and lower limits of the confidence intervals for the regression parameters. |
Cutoffs |
A table detailing a range of cutoff points and the associated number of selected spectral bins. |
number |
The number of spectral bins selected by the user. |
cutoff |
The cutoff value selected by the user. |
BIC |
A vector containing the BIC values for the fitted models. |
AIC |
A vector containing the AIC values for the fitted models. |
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan.
References
Nyamundanda G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report, University College Dublin.
See Also
ppcca.metabol
, ppcca.scores.plot
,loadings.jack.plot
Examples
data(UrineSpectra)
## Not run:
mdlfit<-ppcca.metabol.jack(UrineSpectra[[1]], UrineSpectra[[2]][,2], minq=2, maxq=2)
loadings.jack.plot(mdlfit)
ppcca.scores.plot(mdlfit, UrineSpectra[[2]][,2], group=UrineSpectra[[2]][,1], covarnames="Weight")
## End(Not run)
Plot scores from a fitted PPCCA model.
Description
A function to plot the scores resulting from fitting a PPCCA model to metabolomic data.
Usage
ppcca.scores.plot(output, Covars, group = FALSE, covarnames=NULL)
Arguments
output |
An object resulting from fitting a PPCCA model. |
Covars |
An N x L covariate data matrix where each row is a set of covariates. |
group |
Should it be relevant, a vector indicating the known treatment group membership of each observation. |
covarnames |
Should it be relevant, a vector string indicating the names of the covariates. |
Details
This function produces a series of scatterplots each illustrating the estimated score for each observation within the reduced q dimensional space. The uncertainty associated with the score estimate is also illustrated through its 95
It is often the case that observations are known to belong to treatment groups; the treatment group membership of each observation can be illustrated on the plots produced by utilizing the ‘group’ argument.
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan
References
Nyamundanda, G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report. University College Dublin, Ireland.
See Also
ppcca.metabol
, ppcca.metabol.jack
Function to scale metabolomic spectral data.
Description
This function provides the options of Pareto scaling, unit scaling or no scaling of metabolomic data.
Usage
scaling(Y, type = "none")
Arguments
Y |
An N x p matrix of metabolomic spectra. Each row of Y is an observation's spectrum. |
type |
Default is "none" meaning the data are not altered. If "pareto", the data are Pareto scaled. If "unit", the data are unit scaled. |
Details
Pareto scaling, frequently utilised in metabolomic analyses, scales data by dividing each variable by the square root of the standard deviation. Unit scaling divides each variable by the standard deviation so that each variable has variance equal to 1.
Value
The function returns the requested scaled version of the input matrix Y
.
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan.
References
van den Berg, R.A., Hoefsloot, H.C.J, Westerhuis, J.A. and Smilde, A.K. and van der Werf, M.J. (2006) Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics, 7, 1, 142.
Nyamundanda G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report, University College Dublin.
Function to scale covariates.
Description
A function to scale covariates so that they lie in [0,1] for reasons of stability and convergence of the EM algorithm.
Usage
standardize(Covars)
Arguments
Covars |
An N x L matrix containing the L covariates of each of N observations. |
Details
A function to scale covariates so that they lie in [0,1] for reasons of stability and convergence of the EM algorithm. Care must be taken with categorical covariates: see ppcca.metabol
for further information.
Value
Covars |
A standardized version of the input matrix of covariates. |
Author(s)
Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan