# Overview of MixtComp Object

#### 2021-03-29

mixtCompLearn returns an object of class MixtCompLearn and MixtComp whereas mixtCompPredictreturns an object of class MixtComp.

## MixtComp Object

Overview of output object with variables named categorical, gaussian, rank, functional, poisson, nBinom and weibull with model respectively Multinomal, Gaussian, Rank_ISR, Func_CS (or Func_SharedAlpha_CS), Poisson, NegativeBinomial and Weibull. In case of a successfull run, the output object is a list of list organized as follows:

output
|_______ algo __ nbBurnInIter
|             |_ nbIter
|             |_ nbGibbsBurnInIter
|             |_ nbGibbsIter
|             |_ nInitPerClass
|             |_ nSemTry
|             |_ mode
|             |_ nInd
|             |_ confidenceLevel
|             |_ nClass
|             |_ ratioStableCriterion
|             |_ nStableCriterion
|             |_ basicMode
|             |_ hierarchicalMode
|
|_______ mixture __ BIC
|                |_ ICL
|                |_ lnCompletedLikelihood
|                |_ lnObservedLikelihood
|                |_ IDClass
|                |_ IDClassBar
|                |_ delta
|                |_ runTime
|                |_ nbFreeParameters
|                |_ completedProbabilityLogBurnIn
|                |_ completedProbabilityLogRun
|                |_ lnProbaGivenClass
|
|_______ variable __ type __ z_class
|       |_ categorical
|       |_ gaussian
|       |_ ...
|
|_ data __ z_class __ completed
|       |          |_ stat
|       |_ categorical __ completed
|       |              |_ stat
|       |_ ...
|       |_ functional __ data
|                     |_ time
|
|_ param __ z_class __ stat
|           |_ log
|           |_ paramStr
|_ functional __ alpha __ stat
|             |        |_ log
|             |_ beta __ stat
|             |       |_ log
|             |_ sd __ stat
|             |     |_ log
|             |_ paramStr
|_ rank __ mu __ stat
|       |     |_ log
|       |_ pi __ stat
|       |     |_ log
|       |_ paramStr
|
|_ gaussian __ stat
|           |_ log
|           |_ paramStr
|_ poisson __ stat
|          |_ log
|          |_ paramStr
|_ ...

### warnLog

In case of an unsuccessfull run, the output object is a list containing an element warnLog with all the warnings returned by MixtComp.

### algo

A copy of algo parameter.

• nbBurnInIter Number of iterations of the burn-in part of the SEM algorithm.
• nbIter Number of iterations of the SEM algorithm.
• nbGibbsBurnInIter Number of iterations of the burn-in part of the Gibbs algorithm.
• nbGibbsIter Number of iterations of the Gibbs algorithm.
• nInitPerClass Number of individuals used to initialize each cluster.
• nSemTry Number of try of the algorithm for avoiding an error.
• confidenceLevel Confidence level for confidence bounds for parameter estimation.
• ratioStableCriterion Stability partition required to stop earlier the SEM .
• nStableCriterion Number of iterations of partition stability to stop earlier the SEM.
• nInd number of samples in the dataset
• nClass number of class of the mixture
• mode “predict” for mixtCompPredict or “learn” for mixtCompLearn
• basicMode If TRUE, mixtCompLearn has run in basic mode (mode using classic R formatting for missing data and with automatic detection of model)
• hierarchicalMode If TRUE, mixtCompLearn has run in hierarchical mode (learn a model with two classes, then split each classes in two and so on)

### mixture

• BIC value of BIC
• ICL value of ICL
• nbFreeParameters number of free parameters of the mixture model
• lnObservedLikelihood observed loglikelihood
• lnCompletedLikelihood completed loglikelihood
• IDClass entropy used to compute the discriminative power (see computeDiscrimPowerVar function)
• IDClassBar entropy used to compute the discriminative power (see computeDiscrimPowerVar function)
• delta entropy used to compute the similarities between variables (see heatmapVar function)
• completedProbabilityLogBurnIn evolution of the completed log-probability during the burn-in period (can be used to check the convergence and determine the ideal number of iteration)
• completedProbabilityLogRun evolution of the completed log-probability after the burn-in period (can be used to check the convergence and determine the ideal number of iteration)
• runTime a list containing the execution time in seconds of different part of the algorithm
• lnProbaGivenClass log-probability of each sample for each class times the proportion): $$\log(\pi_k)+\log(P(X_i|z_i=k))$$

### variable

#### type

Named list (according to variable names) containing model used for each variable (e.g. “Gaussian”).

#### data

Except for functional models and LatentClass, data contains, for each variable, two elements: completed and stat. completed contains the completed data and stat contains statistics about completed data. The format is detailed below according to the model.

• LatentClass

Two elements: completed and stat. completed contains the completed data. stat is a matrix with the same number of columns as the number of class. For each sample, it contains the $$t_{ik}$$ (probability of $$x_i$$ to belong to class k) estimated with the imputed values during the Gibbs at the end of each iteration after the burn-in phase of the algorithm.

• Gaussian/Poisson/NegativeBinomial/Weibull

stat is a matrix where each row corresponds to a missing data and contains 4 elements: index of the missing data, median, 2.5% quantile, 97.5% quantile (if the confidenceLevel parameter is set to 0.95) of imputed values during the Gibbs at the end of each iteration after the burn-in phase of the algorithm.

• Multinomial

stat is a named list where each element corresponds to a missing data. The name of the element corresponds to the index of the missing data. It contains a matrix containing the imputed values, during the Gibbs at the end of each iteration after the burn-in phase of the algorithm, and their frequency.

• Rank_ISR

stat is a named list where each element corresponds to a missing data. The name of the element corresponds to the index of the missing data. It contains a matrix containing the imputed values, during the Gibbs at the end of each iteration after the burn-in phase of the algorithm, and their frequency.

• Func_CS and Func_SharedAlpha_CS

Two elements: data and time. time (resp. data) is a list containing the time (resp. value) vector of the functional for each sample.

• Other Models

One element: completed, a matrix/vector containing the completed version of the dataset.

#### param

For one variable, it contains a list with estimated parameters (param), log recorded during the SEM (log) and hyperparameters if any (paramStr). The output format depends of the model but in most of the case, stat is a matrix with 3 columns containing the median values of estimated parameters and quantile ate the desired confidence level, log is matrix containing the estimated proportion during the M step of each iteration of the algorithm after the burn-in phase and paramStr is a string. For the meaning of the parameters, user can refer to the documentation data format.

• LatentClass

A list of 3 elements: stat, log, paramStr. log is matrix containing the estimated proportion during the M step of each iteration of the algorithm after the burn-in phase. stat is a matrix containing the median (and quantiles corresponding to the confidenceLevel parameter) of the estimated proportion. The median proportions are the returned proportions. paramStr contains "".

• Gaussian

The stat matrix has 2*nClass rows. For a class $$k$$, parameters are mean ($$\mu_k$$) and sd ($$\sigma_k$$).

• Poisson

The stat matrix has nClass rows. For a class $$k$$, the parameter is lambda ($$\lambda_k$$).

• NegativeBinomial

The stat matrix has 2*nClass rows. For a class $$k$$, parameters are n ($$n_k$$) and p ($$p_k$$).

• Weibull

The stat matrix has 2*nClass rows. For a class $$j$$, parameters are k (shape) ($$k_j$$) and lambda (scale) ($$\lambda_j$$).

• Multinomial

paramStr contains "nModality: J" where $$J$$ is the number of modalities.

The stat matrix has J*nClass rows. For a class $$k$$, parameters are probabilities to belong to modality $$J$$.

• Rank_ISR

paramStr contains "nModality: J" where $$J$$ is the length of the rank (number of sorted objects).

Two lists (named mu and pi) of 2 elements: stat, log.

For pi, stat is a matrix with nClass rows. For a class $$k$$, parameter is pi ($$pi_k$$).

For mu, stat is a list with nClass elements. For a class $$k$$, a list is returned with the mode of the parameter ($$\mu_k$$), and the frequency of the mode during the SEM algorithm after the burn-in phase.

• Func_CS and Func_SharedAlpha_CS

paramStr contains "nSub: S, nCoeff: C" where $$S$$ is the number of subregressions and $$C$$ the number of coefficients of each regression.

Three lists (named alpha, beta and sd) of 2 elements: stat, log.

For alpha, stat is a matrix with 2*S*nClass rows. For a class $$k$$ and a subregression $$s$$, parameters are the estimated coefficients of a logistic regression controlling the transition between subregressions.

For beta, stat is a matrix with S*C*nClass rows. For a class $$k$$ and a subregression $$s$$, parameters are the estimated coefficient of the regression.

For sd, stat is a matrix with S*nClass rows. For a class $$k$$ and a subregression $$s$$, the parameter is the standard deviation of the residuals of the regression.

## MixtCompLearn Object

A MixtCompLearn object is the output of mixtCompLearn function. It contains one or several $$MixtComp$$ object.

• nClass A vector containing the number of classes tested
• crit ICL and BIC values for each value of nClass
• criterion “BIC” or “ICL”, the criterion used to choose the number of classes
• algo, mixture, variable, warnLog MixtComp object associated with the best number of classes
• res A list containing one MixtComp object per number of class. The first element (res[[1]]) corresponds to the MixtComp object for a number of classes of nClass[1]
• nRun Number of runs for each number of classes
• totalTime Total running time