As a young adult interested in Lego, I’ve always wanted to analyze the Lego dataset to understand the trend, theme, and sets change. This analysis is inspired by this Mode Analytics post

  • Data Source

The dataset is from the website Rebrickable. It provides a database storing LEGO Items -Sets, Parts, and Minifigs. Daily data is accessible either through CSV download or API

As a recent graduate with Data Science & Analytics degree, I want to start my journal to write about machine learning concepts, methodology learned at school.

Clustering is an unsupervised machine learning technique. Without predefined labels on a dataset, this algorithm could be used to find patterns in data and separates them into multiple subgroups based on the similarity. Here I will introduce one major clustering algorithm: K-Means Clustering.

K-Means Clustering

K represents the number of clusters we aim to group to in the data. We need to decide ahead of time.

How to decide K?

Subject matter/expertise could be leveraged to…

Youfang Zhang

Data Enthusiast

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store