Multiple Partition Clustering

Date/heure
18 février 2021
10:45 - 11:45

Lieu
Salle de probabilités et statistique virtuelle

Oratrice ou orateur
Vincent Vandewalle (Université de Lille)

Catégorie d'évènement
Séminaire Probabilités et Statistique


Résumé

This talk deals with clustering when several latent class variables are considered (multiple partition clustering). Indeed, assuming that all heterogeneity in the data can be explained by one single variable is very strong, and it can be useful to consider that several blocks (or linear combinations) of variables can provide different partitions of individuals. This can reveal new lines of analysis in the data. In this framework, we present two approaches. The first one assumes the existence of several groups of variables, each leading to a different partition of the individuals [1]. It makes it possible to classify the variables into blocks, each producing a specific grouping of individuals. The model assumes the independence between blocks of variables, and in each block the independence of the variables given the cluster. An efficient approach is proposed to search for the blocks of variables as well as performing the estimation of the different partitions of the individuals. The second one assumes the existence of several classifying projections in the data [2]. It makes it possible to obtain different classifying projections and the associated partitions. The model assumes that the data are obtained based on linear combinations of classifying and non classifying variables, where each classifying variable is assumed to follow a specific mixture distribution. The parameters of the models are estimated through a generalized EM algorithm. The behavior of these models will be illustrated in simulated and real data. We will discuss how using such kind of models can give new insight from the data analysis point of view, and can be considered for further investigation. References: [1] Marbac, M. and Vandewalle, V. (2019). “A tractable multi-partitions clustering”. In: Computational Statistics & Data Analysis 132, pp. 167–179. [2] Vandewalle, V. (2020). “Multi-Partitions Subspace Clustering”. In: Mathematics 8.4, p. 597.