For this new workshop, we propose an introduction to unsupervised models.
Structured data used in data-science are generally made of tables where each row represents an individual in the statistical sense (person, machine, event,…), and each column represents a characteristic of each individual (age, date, height, etc…).
Segmentation is a branch of unsupervised learning in which these characteristics are used to group these individuals by similarity.
The difficulty is to find the right balance between segments (clusters) that are well separated from each other, and that each segment is as homogeneous as possible.
In doing so, unexpected structures are sometimes discovered in a dataset.
2. Segmentation tutorial
This tutorial will be based on a Jupyter/Python notebook that uses the Scikit-Learn libraries.
It will show how to reduce the number of dimensions (number of columns) of a dataset using Principal Component Analysis (PCA) and then segment the population described by the dataset using the K-means algorithm.
This tutorial will be led by Georges-Henri Moll (https://www.linkedin.com/in/georgeshenrimoll/).
Translated with www.DeepL.com/Translator (free version)