Three Ways to Cluster Keywords. Stop Trying to Pick One.

Most articles about keyword clustering argue one method is the right one. SERP-based is "the most accurate". Semantic is "the only one that understands meaning". Stems-based is "all you really need for most cases". The truth is duller and more useful: each method catches something the others miss, and the practical workflow uses all three at different stages of the same project.

This article walks through what each method actually does, when to reach for which, and why combining them produces better content plans than picking one and running with it. It maps onto the Keyword Clustering tool we built at Algorithm — three methods in one editor — but the logic applies regardless of which tool you end up using.

What clustering is, briefly

Clustering takes a flat list of search queries and turns it into a content map: which keywords belong on the same page, which need separate pages, which are silently competing with each other right now. The decision matters because both extremes hurt. Cluster too loosely and you build bloated pages that rank for nothing specific. Cluster too tightly and you produce thin near-duplicate URLs that cannibalise each other in the same SERPs.

A good cluster solves one identifiable user need. One primary keyword, a set of secondary ones — synonyms, long-tails, related questions — all pointing at the same intent. When a page targets a well-built cluster, it can rank for hundreds of queries simultaneously, because Google already understands they want the same answer. When the cluster is wrong, none of that compounding happens.

Three methods, three different things they catch

Stems / pattern-based. Groups keywords by shared root words or word stems. "Best running shoes for flat feet" and "best running shoes for marathons" go into the same group because they share three words. Fast, free, no LLM cost, no SERP fetch. Useful as a first pass on a messy list of several thousand keywords — it reduces chaos faster than any other method.

The catch is that stems-based grouping has no concept of intent. The two running-shoe examples above probably have completely different SERPs (one dominated by orthopaedic content, the other by athletic reviews) and should usually live on separate pages. Stems can't see that. Anyone using stems-based clustering as the final word in their content plan ends up with bloated pages that confuse Google about what each URL is really for.

AI semantic. Uses a language model to group keywords by meaning rather than by shared words. It catches that "cheap flights to Rome" and "affordable airfare Italy" are conceptually similar even though they share no exact terms. This is a real upgrade over stems for any list with synonym variation, brand-name variants, or queries translated from non-English origin terms.

The limitation is more subtle than people realise. Semantic clustering reflects a language model's understanding of similarity, not Google's. Two keywords can be semantically near-identical and still return completely different SERPs. "Online marketing" and "digital marketing" are near-synonyms in any embedding model, but the SERP for each often surfaces different page types — agencies for one, beginner guides for the other. A semantic tool will put them in the same cluster. Google has already decided they're separate intents. If you target both with one page, you'll under-rank for at least one of them.

SERP-based. Checks the actual top-10 for each keyword and clusters keywords that share ranking URLs. The logic: if keyword A and keyword B return three or more of the same URLs in their top 10, Google itself treats them as the same intent. They belong on one page.

This is the most accurate method available, because it's not predicting what Google means — it's reading what Google has already decided. The trade-off is cost and speed. Every keyword needs a live SERP fetch. Running 5,000 keywords through SERP-based clustering takes time and burns budget. That's why it's almost never the right method to run on a raw, unfiltered list. Apply stems and semantic first, get to the keywords that actually matter, then run SERP on the refined set.

Why combining them is the practical workflow

A content planning project usually starts with a messy list. Two thousand keywords pulled from research exports, Search Console pulls, competitor scraping, internal site search logs. Some are duplicates with different word orders. Some are near-synonyms. Some are off-topic. Some are genuinely unique queries you'll need to think about.

Running SERP-based clustering on that list directly is wasteful. You'd burn hundreds of SERP fetches on keywords that will get pruned in the next pass anyway. The right shape of the workflow is wider-to-narrower.

The first pass uses stems. Free, fast, gets you from 2,000 chaotic rows to maybe 1,200 sensible buckets. You're not trying to build content clusters here. You're trying to remove noise — exact duplicates, obvious spelling variants, queries that share a clear stem and belong together at the bucket level.

The second pass uses semantic clustering on the deduplicated list to catch the synonym overlaps stems missed. Two keywords with no shared words but the same meaning will get joined here. By the end of this step you have a cleaner list, maybe 800 keywords, organised into rough thematic groups.

SERP-based clustering goes last, and only on the keywords you're actually planning content for. Now you're paying for SERP fetches on queries that survived the earlier filters, which is a much better use of budget. The output of this step is what you turn into your content brief.

This is the difference between a content plan that costs $80 to produce and one that costs $400 — and the cheaper one is often more accurate, because the earlier stages catch problems that would otherwise distort the SERP-based step.

What SERP-based actually shows you

Worth pausing on this because it's the method most teams underuse, and it's the one that catches things nothing else can.

Two examples I see often in audits.

The intent split that semantic models miss. "Buy laptop online" and "best laptop reviews" feel similar enough that any semantic model will cluster them. The SERPs are completely different — the first is an e-commerce SERP dominated by retailer listings, the second is a review-content SERP dominated by tech publications. Targeting both with one page produces a worse outcome than two pages, and only SERP-based clustering reveals that without testing.

The consolidation opportunity that looks like two different topics. "Back pain exercises" and "stretches for lumbar pain" share zero meaningful words. A stems clusterer keeps them apart. A semantic clusterer might put them in the same cluster but might not, depending on training data. SERP-based clustering looks at the actual ranking results — and finds eight of the same top-10 URLs across both queries. Google has already decided they're the same intent. They belong on one page, and you'd never have known that without a SERP check.

The point isn't that SERP-based is always right. It's that SERP-based shows you Google's interpretation of intent overlap, and that interpretation is what you're actually trying to plan content around. Anything else is a model of what Google might do.

The clustering output is a starting point, not an answer

This is the part most clustering tools get wrong, including the ones charging $99 a month. They produce clusters and treat the output as final. The user gets a CSV. The user has no way to fix the obvious cases where the algorithm got it wrong, no way to merge two clusters that should be one, no way to split a bloated cluster into two cleaner ones.

Algorithms cluster. Humans decide.

We built the editor around this assumption. Run any of the three methods, get the output, then drag and drop to fix what's wrong. Move keywords between clusters. Merge two clusters that the SERP method joined too aggressively. Split a stems-based cluster that conflated two intents. Export the cleaned result.

The reason this matters: clustering algorithms optimise for accuracy on average, not accuracy on every individual cluster. A SERP-based run with a threshold of three shared URLs will produce some clusters that are clearly correct and some that are borderline. The borderline ones need human judgement — and most tools don't even surface that they're borderline. They print a cluster and call it final.

Cannibalisation shows up here

Cannibalisation is one of those things that's much easier to spot in a clustering output than in a Search Console report.

When you cluster a domain's keyword list — your existing target keywords plus any queries you're considering targeting — the clusters reveal which of your current pages compete with each other. If two of your URLs appear ranking inside the same cluster, Google is splitting its quality signal between them. One of those pages needs to be merged, deleted, or re-targeted to a different intent.

SERP-based clustering catches this most reliably because it uses live ranking data — the same data Google's making cannibalisation decisions on. Stems and semantic methods can produce the same finding, but with more false positives. If you're auditing an existing site for cannibalisation specifically, SERP is the method to use, even on a moderate-sized keyword list.

This is also why we run cannibalisation detection inside the SEO Audit tool's content audit module — same underlying logic, integrated with content quality scoring on the same URLs.

What this tool deliberately doesn't do

A few choices on the positioning, on purpose.

No locking you into one method. The editor lets you run stems on the same list, then re-run with AI semantic, then re-run with SERP-based — and compare the outputs side by side in the same view. Most clustering tools force you into one approach by pricing the others into a higher plan, or hiding them entirely.

No monthly subscription for occasional use. Clustering isn't something most teams do every week. It's a project-stage activity — when planning a new section, refreshing a content strategy, auditing an existing site for cannibalisation. So we charge per use rather than monthly, which means you don't pay for capacity sitting idle three weeks out of four.

No pretending stems are semantic. A surprising number of competing tools advertise "semantic clustering" while actually running pattern-based or hybrid stem matching under a friendlier name. The output is the same as basic stems clustering, just labelled differently. We label what each method actually does, including the free one, because mislabelling makes the workflow worse for everyone.

No SERP caching. SERPs change daily. If you're clustering today, you should cluster on today's SERP, not on last month's cached snapshot from when somebody else ran the same keyword. Live fetch every time. That's part of why SERP-based costs more — and part of why it's worth it.

What to do in the next hour

Pick a keyword list you're working on right now. Ideally between 200 and 1,000 keywords — that's the range where method choice actually matters.

Run stems-based clustering first. Free. Look at what comes out. Note which clusters obviously make sense, which ones are obviously wrong (different intents jammed together), and which are ambiguous.

Now take 50 of the ambiguous keywords. Run them through SERP-based clustering. Compare what changes. The keywords that move between clusters are the ones where stems was misleading you, and those are the cases that would have caused content cannibalisation if you'd built pages from the stems output directly.

Two clustering passes on the same list, comparing where they agree and where they don't, will teach you more about how Google sees your queries than any guide. The disagreements are the data.

The hard part of clustering isn't running the algorithm. The hard part is being willing to let the output overrule your gut feeling about which keywords go together.