Skip to main content

How We Research LEGO Sets

Our data sources, verification process, and editorial standards — so you can trust the information on every set page.

Our Primary Data Sources

Rebrickable

The majority of our set catalogue (16,800+ sets) comes from Rebrickable, a community-maintained database of LEGO sets and parts. Rebrickable is the most comprehensive publicly available LEGO dataset and is widely used by LEGO tools, apps, and communities. Each set entry includes the set number, name, year of release, piece count, theme, and official image URL.

LEGO.com Official Product Pages

Building instruction links on each set page route to official LEGO.com PDFs, which are provided free of charge by the LEGO Group. We link directly to the official source rather than hosting copies. When a set's instructions aren't on LEGO.com (common for older or discontinued sets), we note this and explain alternative sources.

Curated Set Profiles

For our most popular or notable sets, we maintain hand-curated profiles that include descriptive text, themed image links, related sets, and additional editorial context. These are researched and written by the LegoFinder editorial team and cross-referenced against official LEGO documentation, collector resources, and community knowledge.

How Set Content Is Generated

Each of the 16,800+ set pages on LegoFinder contains both sourced data and generated editorial content:

Sourced data

Set number, name, year, piece count, theme, and image URL come directly from Rebrickable or official LEGO sources. This data is updated when we refresh the Rebrickable dataset.

Generated editorial content

The About paragraph, Build Profile, FAQ, Use Cases, Build Tips, and Set Highlights are generated programmatically using a rules-based system we built. The system combines the set's sourced data (piece count, year, theme) with a knowledge base of 100+ LEGO themes that encodes each theme's typical audience, primary use (display vs. play), era, and licensing status.

The output varies meaningfully across the full catalogue — a vintage 1985 Space set produces different content than a 2023 Star Wars UCS set or a Duplo toddler set. Every sentence is deterministic: the same set data always produces the same output, and no content is randomly generated or AI-hallucinated.

Accuracy and Limitations

We publish piece counts, years, and theme classifications as provided by Rebrickable. Rebrickable is community-maintained, which means occasional errors exist, particularly for:

  • Very old sets (pre-1980) where official records are incomplete
  • Regional variants and re-releases of the same set
  • Promotional or limited-distribution sets
  • Sets with variant part counts across production runs

If you notice an error in a set's data, use the contact form to report it. We review corrections and update our dataset accordingly.

AI Identification Methodology

The LegoFinder AI identification tool uses large multimodal vision models to analyze uploaded photos. The identification process:

1

Image preprocessing

Your photo is compressed and normalized before analysis to reduce noise from different devices and lighting conditions.

2

Visual feature extraction

The model analyzes distinctive visual elements: brick colors, printed surfaces, minifigure details, vehicle shapes, and characteristic structures.

3

Set matching

Features are matched against our catalogue of 16,800+ sets. The system returns the top prediction(s) with a confidence score.

4

Instructions lookup

The matched set number is used to retrieve the official LEGO.com instructions URL for immediate access.

AI identification accuracy varies by photo quality, set size, and how much of the set is visible. We recommend verifying identifications against the set number printed on the box or instruction booklet.

Independence and Affiliations

LegoFinder is an independent project. We are not affiliated with, sponsored by, or endorsed by the LEGO Group. LEGO® is a registered trademark of the LEGO Group of companies. All LEGO set images are owned by the LEGO Group and are displayed or linked from official sources.

The site carries advertising (Google AdSense). Advertising does not influence which sets we feature, how we present data, or what we write in our guides. Buying guides and recommendations reflect our genuine editorial judgment.

Keeping the Database Current

We update the Rebrickable dataset periodically to add newly released sets and correct errors in existing entries. Set pages include a “verified” year reflecting when the data was last reviewed by the editorial team.

For editorial content (guides, year pages, age pages), we review and update periodically as new sets are released or community knowledge about older sets improves.

Questions about our methodology?

We welcome feedback, corrections, and methodology questions.

Get in Touch