| Type: | Package | 
| Title: | Covariate-Augumented Generalized Factor Model | 
| Version: | 1.1 | 
| Date: | 2024-06-21 | 
| Author: | Wei Liu [aut, cre], Jiakun Jiang [aut], Dewei Xiang [aut], Xuancheng Zhou [aut] | 
| Maintainer: | Wei Liu <LiuWeideng@gmail.com> | 
| Description: | Covariate-augumented generalized factor model is designed to account for cross-modal heterogeneity, capture nonlinear dependencies among the data, incorporate additional information, and provide excellent interpretability while maintaining high computational efficiency. | 
| BugReports: | https://github.com/feiyoung/CMGFM/issues | 
| License: | GPL-3 | 
| Depends: | irlba, R (≥ 3.5.0) | 
| Imports: | MASS, stats, GFM, Rcpp (≥ 1.0.10) | 
| Suggests: | knitr, rmarkdown | 
| LinkingTo: | Rcpp, RcppArmadillo | 
| VignetteBuilder: | knitr | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.1 | 
| NeedsCompilation: | yes | 
| Packaged: | 2024-06-25 04:40:10 UTC; 10297 | 
| Repository: | CRAN | 
| Date/Publication: | 2024-06-25 15:00:05 UTC | 
Fit the CMGFM model
Description
Fit the covariate-augumented generalized factor model
Usage
CMGFM(
  XList,
  Z,
  types,
  numvarmat,
  q = 15,
  Alist = NULL,
  init = c("LFM", "GFM", "random"),
  maxIter = 30,
  epsELBO = 1e-08,
  verbose = TRUE,
  add_IC_iter = FALSE,
  seed = 1
)
Arguments
| XList | a list consisting of multiple matrices in which each matrix has the same type of values, i.e., continuous, or count, or binomial/binary values. | 
| Z | a matrix, the fixed-dimensional covariate matrix with control variables. | 
| types | a string vector, specify the variable type in each matrix in  | 
| numvarmat | a  | 
| q | an optional string, specify the number of factors; default as 15. | 
| Alist | an optional vector, the offset for each unit; default as full-zero vector. | 
| init | an optional character, specify the method in initialization. | 
| maxIter | the maximum iteration of the VEM algorithm. The default is 30. | 
| epsELBO | an optional positive value, tolerance of relative variation rate of the evidence lower bound value, default as '1e-8'. | 
| verbose | a logical value, whether output the information in iteration. | 
| add_IC_iter | a logical value, add the identifiability condition in iterative algorithm or add it after algorithm converges; default as FALSE. | 
| seed | an integer, set the random seed in initialization, default as 1; | 
Details
None
Value
return a list including the following components:
-  betaf- the estimated regression coefficient vector for each modality;
-  Bf- the estimated loading matrix for each modality;
-  M- the estimated modality-shared factor matrix;
-  Xif- the estimated modality-specified factor vector;
-  S- the estimated covariance matrix of modality-shared latent factors;
-  Om- the posterior variance of modality-specified latent factors;
-  muf- the estimated intercept vector for each modality;
-  Sigmam- the variance of modality-specified factors;
-  invLambdaf- the inverse of the estimated variances of error for each modality.
-  ELBO- the ELBO value when algorithm stops;
-  ELBO_seq- the sequence of ELBO values.
-  time_use- the running time in model fitting;
References
None
See Also
None
Examples
pveclist <- list('gaussian'=c(50, 150),'poisson'=c(50, 150),
   'binomial'=c(100,60))
q <- 6
sigmavec <- rep(1,3)
pvec <- unlist(pveclist)
datlist <- gendata_cmgfm(pveclist = pveclist, seed = 1, n = 300,d = 3,
                         q = q, rho = rep(1,length(pveclist)), rho_z=0.2,
                         sigmavec=sigmavec, sigma_eps=1)
XList <- datlist$XList
Z <- datlist$Z
numvarmat <- datlist$numvarmat
types <- datlist$types
rlist <- CMGFM(XList, Z, types=types, numvarmat, q=q)
str(rlist)
Select the number of factors
Description
Select the number of factors using maximum singular value ratio based method
Usage
MSVR(
  XList,
  Z,
  types,
  numvarmat,
  Alist = NULL,
  q_max = 20,
  threshold = 1e-05,
  ...
)
Arguments
| XList | a list consisting of multiple matrices in which each matrix has the same type of values, i.e., continuous, or count, or binomial/binary values. | 
| Z | a matrix, the fixed-dimensional covariate matrix with control variables. | 
| types | a string vector, specify the variable type in each matrix in  | 
| numvarmat | a  | 
| Alist | an optional vector, the offset for each unit; default as full-zero vector. | 
| q_max | an optional string, specify the maximum number of factors; default as 20. | 
| threshold | an optional positive value, a cutoff to filter the singular values that are smaller than it. | 
| ... | other arguments passed to CMGFM | 
Details
None
Value
return the estimated number of factors.
References
None
See Also
None
Examples
pveclist <- list('gaussian'=c(50, 150),'poisson'=c(50, 150),
   'binomial'=c(100,60))
q <- 6
sigmavec <- rep(1,3)
pvec <- unlist(pveclist)
datlist <- gendata_cmgfm(pveclist = pveclist, seed = 1, n = 300,d = 3,
                         q = q, rho = rep(1,length(pveclist)), rho_z=0.2,
                         sigmavec=sigmavec, sigma_eps=1)
XList <- datlist$XList
Z <- datlist$Z
numvarmat <- datlist$numvarmat
types <- datlist$types
hq <- MSVR(XList, Z, types=types, numvarmat, q_max=20)
print(c(q_true=q, q_est=hq))
Generate simulated data
Description
Generate simulated data from covariate-augumented generalized factor model
Usage
gendata_cmgfm(
  seed = 1,
  n = 300,
  pveclist = list(gaussian = c(50, 150), poisson = c(50), binomial = c(100, 60)),
  q = 6,
  d = 3,
  rho = rep(1, length(pveclist)),
  rho_z = 1,
  sigmavec = rep(0.5, length(pveclist)),
  n_bin = 1,
  sigma_eps = 1,
  seed.para = 1
)
Arguments
| seed | a positive integer, the random seed for reproducibility of data generation process. | 
| n | a positive integer, specify the sample size. | 
| pveclist | a named list, specify the number of modalities for each variable type and dimension of variables in each modality. | 
| q | a positive integer, specify the number of modality-shared factors. | 
| d | a positive integer, specify the dimension of covariate matrix. | 
| rho | a numeric vector with length  | 
| rho_z | a positive real, specify the signal strength of covariates. | 
| sigmavec | a positive vector with length  | 
| n_bin | a positive integer, specify the number of trails in Binomial distribution. | 
| sigma_eps | a positive real, the variance of overdispersion error. | 
| seed.para | a positive integer, the random seed for reproducibility of data generation process by fixing the regression coefficient vector and loading matrices. | 
Details
None
Value
return a list including the following components:
-  XList- a list consisting of multiple matrices in which each matrix has the same type of values, i.e., continuous, or count, or binomial/binary values.
-  Z- a matrix, the fixed-dimensional covariate matrix with control variables;
-  Alist- the the offset vector for each modality;
-  B0list- the true loading matrix for each modality;
-  mu0- the true intercept vector for each modality;
-  U0- the modality-specified factor vector;
-  F0- the modality-shared factor matrix;
-  Uplist- the true intercept-loading matrix for each modality;
-  beta- the true regression coefficient vector for each modality;
-  sigma_eps- the standard deviation of error term;
-  numvarmat- a length(types)-by-d matrix, the number of variables in modalities that belong to the same type.
References
None
See Also
Examples
n <- 300; 
pveclist = list('gaussian'=c(50, 150),'poisson'=c(50),'binomial'=c(100,60))
d <- 20; q <- 6;
datlist <- gendata_cmgfm(n=n, pveclist=pveclist, q=q, d=d)
str(datlist)