A 50-dimensional embedding space trained on listening data from Last.fm, using only user-artist pairs without genre labels or metadata. By observing the listening habits of thousands of people, the model discovered musical genres entirely on its own. Artists that clustered together in the embedding space turned out to share genres, and the clusters mapped cleanly onto categories like country, metalcore, ambient, NYC rap, etc. The model was so good at picking up genre nuances that I ended up learning about sub-genres of music I’d never heard of while trying to debug! Built with Python and PyTorch.
Below is a T-SNE plot of the embedding space. Each dot is an artist, colored by genre.
