Programmatic SEO for Niche Marketplaces: Generating 100k+ Pages That Actually Rank

How we built a pipeline that generates 100k+ localized wiki pages for a European car parts marketplace — and what actually moves the needle for programmatic SEO in 2026.

Programmatic SEO for Niche Marketplaces: Generating 100k+ Pages That Actually Rank

Programmatic SEO for Niche Marketplaces: Generating 100k+ Pages That Actually Rank

Building a marketplace that serves multiple countries means dealing with a problem that most tutorials skip over: how do you create meaningful, indexable content at scale — without producing the kind of thin, templated garbage that Google learned to ignore years ago?

I've been working on hank.parts, a European marketplace for classic and vintage car parts. The platform connects buyers who need specific parts (think: a water pump for a 1978 Porsche 911) with sellers across Europe. It supports seven languages and covers thousands of car models from the past 60+ years.

One of the more interesting technical challenges: generating wiki-style reference pages for every supported car model, in every language, without ending up with a pile of boilerplate that does nothing for users or search engines. This post covers the approach and some things that actually worked.

The Problem With Templated Pages

The naive approach to programmatic SEO is simple: take a database of entities, plug values into a template, generate HTML. You see it everywhere — {Make} {Model} parts for sale. Buy {Make} {Model} components online. Repeat for 5,000 models.

Google got wise to this pattern a long time ago. Pages like that are thin content by definition: they carry no unique information, no reason for anyone to link to them, and no reason for a user to stay on the page. The helpful content update specifically targets this kind of output.

So the question becomes: how do you produce genuinely useful pages at scale, for thousands of entities, without manually writing each one?

The General Approach

At a high level, the pipeline works like this:

  1. For each car model in the database, search the web for real information — production history, notable variants, common issues
  2. Feed those search results into an LLM as context, so it synthesizes real facts rather than hallucinating them
  3. Translate the content into all supported languages
  4. Render static HTML with proper SEO markup and upload to a CDN

The result is a page like hank.parts/wiki/en/volkswagen/golf-mk1 — genuine fun facts about the Golf Mk1, its production history, the origin of the GTI, alongside a catalog of common replacement parts with context about why they typically need replacing.

The key distinction from typical AI-generated SEO content: the LLM never writes from its own "knowledge." It always works from fresh search results. This matters because car enthusiasts will fact-check your content, and they will bounce if your page confidently states that the Fiat 500 was first produced in 1989.

Why Static HTML Still Wins for SEO

The wiki pages are not part of our main frontend application (which is a React SPA). They're plain static HTML files served directly from a CDN. No JavaScript framework, no hydration, no client-side routing.

This sounds boring, and it is. That's the point:

  • Performance: Static HTML loads fast everywhere. No JS bundle to download and parse. Core Web Vitals are basically free.
  • Crawlability: Search engine bots get complete content immediately. No waiting for client-side rendering, no hoping Googlebot executes your JavaScript correctly.
  • Cost: CDN serving static files is essentially free at any scale. We're talking fractions of a cent per thousand requests.
  • Independence: The wiki can be updated without deploying the main application. Different release cycles, different failure domains.

If you're building a modern SPA with TanStack Router or Next.js or whatever the framework of the month is, you might be tempted to render your SEO pages through the same system. Don't. Keep them separate. Your marketing pages and your app have fundamentally different requirements.

The Multilingual Challenge

Here's where it gets more interesting. hank.parts serves seven language markets: English, German, Spanish, French, Italian, Dutch, and Polish. That means every wiki page needs to exist in all seven languages — and search engines need to understand the relationship between them.

There are two approaches to multilingual programmatic content:

Translate the finished HTML. Simple, but you lose control over formatting and the translation service might mangle your markup. Paragraph boundaries shift, lists reorder, code snippets get translated (yes, I've seen this happen).

Translate the structured content, then render. Generate your content as structured data first — JSON with clear fields for titles, descriptions, facts. Translate those fields. Then render each language from its own translated data, keeping the HTML template, navigation, and structural elements localized separately.

We went with the second approach. It's more work upfront, but the output is significantly cleaner. Every language version gets its own canonical URL and proper hreflang tags:

<link rel="alternate" hreflang="en" href="https://hank.parts/wiki/en/porsche/911-993" />
<link rel="alternate" hreflang="de" href="https://hank.parts/wiki/de/porsche/911-993" />
<link rel="alternate" hreflang="es" href="https://hank.parts/wiki/es/porsche/911-993" />
<link rel="alternate" hreflang="fr" href="https://hank.parts/wiki/fr/porsche/911-993" />
<!-- ... all 7 languages -->

If you're doing multilingual SEO, hreflang is non-negotiable. Without it, Google doesn't know which version to serve to which users, and you end up competing with yourself across language variants. I've seen sites cannibalize their own traffic by having near-identical pages in different languages without proper hreflang — Google picks one, suppresses the rest, and it's usually not the one you wanted.

Also worth noting: Schema.org BreadcrumbList structured data matters here. A four-level breadcrumb (Home → Wiki → Make → Model) gives search engines a clear hierarchy and can show up as rich results. It costs almost nothing to implement and helps crawlers understand your site structure.

Content That Makes Sense Per Entity

A detail that turned out to matter more than expected: not every page template makes sense for every entity.

An air-cooled Porsche 911 doesn't have a radiator or water pump. A Tesla doesn't have an exhaust system. A three-wheeler kit car doesn't have door handles.

If you're generating pages for "Find a radiator for your Porsche 356" — a car that famously doesn't have one — you've just told every Porsche enthusiast who lands on your page that you don't know what you're talking about. They'll leave, they'll never come back, and that bounce rate signal is exactly what Google uses to demote thin content.

The fix is straightforward: classify your entities before generating content. For cars, a few boolean flags (electric/combustion, water/air-cooled, has doors/doesn't) are enough to filter your part catalog per model. The point isn't the specific classification — it's the principle that programmatic pages need conditional logic, not just template substitution.

Keeping Pages Alive

Static generation doesn't mean generate-once-and-forget. The pipeline runs daily, doing two things:

  • Full generation for newly added models
  • Incremental updates for existing pages — injecting recent marketplace activity (actual part requests from real users) into the existing HTML

This means a wiki page for the VW Golf Mk1 might show recently posted part requests, updated daily. It bridges the gap between "static reference page" and "living marketplace content," and the freshness signal helps with crawl frequency.

After each cycle, we regenerate the XML sitemap and submit updated URLs via IndexNow. Google doesn't officially support IndexNow, but Bing and Yandex pick up changes within hours. For Google, a well-structured sitemap with <lastmod> timestamps is still the way to go.

What Actually Moves the Needle

After running this system for a few months, here's what I've observed:

What works:

  • Unique, grounded content — pages with real, verified facts about specific car models perform noticeably better than templates with swapped-in model names
  • Proper hreflang implementation — search engines correctly serve language-appropriate versions, and we see organic traffic from all seven markets
  • Dense internal linking — each model page links to its parts, each part links back to the marketplace. Search engines follow these links and the pages reinforce each other
  • Freshness signals — pages that show recent marketplace activity get crawled more frequently

What doesn't work (or at least hasn't yet):

  • Volume alone — having 100k pages means nothing if they're thin. We had better results from 500 high-quality pages than from 5,000 templated ones. Quality per page matters more than total count
  • Expecting overnight results — programmatic pages take months to build authority, especially for a new domain. Plan for 3-6 months before you see meaningful organic traffic
  • Skipping user intent — a page about "Porsche 356 brakes" needs to be useful to someone actually looking for brake parts, not just keyword-stuffed for a crawler. If someone lands on your page and doesn't find what they need, no amount of technical SEO will save you

Takeaways

If you're building programmatic SEO for a niche marketplace or content site:

  1. Ground your content in real data. Use web search results or authoritative sources as LLM context. Don't let the model free-associate.
  2. Structure before rendering. Generate structured data (JSON), translate the data, then render to HTML. Never translate finished HTML.
  3. Classify your entities. Not every template makes sense for every entity. A classification step prevents generating nonsensical pages.
  4. Serve static HTML from a CDN. It's boring, it's fast, it works. Keep your SEO pages separate from your SPA.
  5. Keep pages alive. Daily updates with live data keep pages relevant and signal freshness to crawlers.
  6. Respect the user. If your programmatic page wouldn't be useful to a human visitor, it won't be useful to a search engine either.

The hank.parts wiki currently covers around 2,000 car models across 90+ makes, each with localized content in seven languages. It's a lot of pages, but the pipeline makes it manageable — adding a new model to the database is all it takes to trigger generation.


I'm building hank.parts, a European marketplace where classic car enthusiasts post what parts they need and sellers across Europe respond with offers. If you're into old cars and need a hard-to-find part, check it out.