https://images.unsplash.com/photo-1587654780291-39c9404d746b?ixid=MXwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHw%3D&ixlib=rb-1.2.1&auto=format&fit=crop&w=1050&q=80
https://images.unsplash.com/photo-1587654780291-39c9404d746b?ixid=MXwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHw%3D&ixlib=rb-1.2.1&auto=format&fit=crop&w=1050&q=80

As a young adult interested in Lego, I’ve always wanted to analyze the Lego dataset to understand the trend, theme, and sets change. This analysis is inspired by this Mode Analytics post

  • Data Source

The dataset is from the website Rebrickable. It provides a database storing LEGO Items -Sets, Parts, and Minifigs. Daily data is accessible either through CSV download or API


As a recent graduate with Data Science & Analytics degree, I want to start my journal to write about machine learning concepts, methodology learned at school.

Clustering is an unsupervised machine learning technique. Without predefined labels on a dataset, this algorithm could be used to find patterns in data and separates them into multiple subgroups based on the similarity. Here I will introduce one major clustering algorithm: K-Means Clustering.

K-Means Clustering

K represents the number of clusters we aim to group to in the data. We need to decide ahead of time.

How to decide K?

Subject matter/expertise could be leveraged to…

Youfang Zhang

Data Enthusiast

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store