As a young adult interested in Lego, I’ve always wanted to analyze the Lego dataset to understand the trend, theme, and sets change. This analysis is inspired by this Mode Analytics post
The dataset is from the website Rebrickable. It provides a database storing LEGO Items -Sets, Parts, and Minifigs. Daily data is accessible either through CSV download or API
As a recent graduate with Data Science & Analytics degree, I want to start my journal to write about machine learning concepts, methodology learned at school.
Clustering is an unsupervised machine learning technique. Without predefined labels on a dataset, this algorithm could be used to find patterns in data and separates them into multiple subgroups based on the similarity. Here I will introduce one major clustering algorithm: K-Means Clustering.
K represents the number of clusters we aim to group to in the data. We need to decide ahead of time.
How to decide K?
Subject matter/expertise could be leveraged to…