Σεμινάριο Ioanna Manolopoulou
ΚΥΚΛΟΣ ΣΕΜΙΝΑΡΙΩΝ ΣΤΑΤΙΣΤΙΚΗΣ ΑΠΡΙΛΙΟΣ 2021
Ioanna Manolopoulou, Department of Statistical, Science, UCL, UK
Joint work with: Mariflor Vega and Mirco Musolesi
Posterior summaries of topic models: an example from grocery retail baskets
ΠΕΡΙΛΗΨΗ
Understanding the shopping motivations behind market baskets is an important goal in the grocery retail industry. Analyzing shopping transactions demands techniques that can cope with the volume and complicated dependencies of grocery transactional data, while keeping interpretable outcomes. Latent Dirichlet Allocation (LDA) provides a natural framework to process grocery transactions and to discover a broad representation of customers' shopping motivations. However, summarising the posterior distribution of an LDA model is challenging, because LDA is inherently a mixture model and can exhibit substantial label-switching. Averaging across posterior draws (even after resolving label-switching) inevitably merges semantically different topics which may appear or disappear across draws, and whose average may be semantically meaningless. Moreover, a summary of corresponding uncertainty is not straightforwardly available. In this paper, we introduce clustering methodology that post-processes posterior LDA draws to summarise the entire posterior distribution and identify semantic modes represented as recurrent topics. We illustrate our methods on an example from a large UK supermarket chain.
Teams link: https://bit.ly/3opz2kK