Measure how semantically aligned your backlinks really are. Powered by the same AI model we use in our audits — deterministic, transparent, free.
Google doesn't just count links — it evaluates if the linking page is contextually relevant to your content. We model this with three semantic signals.
The paragraph surrounding your backlink is compared against the target page body. This is the strongest signal — worth 65% of the base score. Models Google's context2 — the hash of terms near the anchor.
The full referring page topic is compared to the target page. Captures topical alignment even when the specific paragraph is weak. Worth 35% — models Google's siteFocusScore.
The anchor is not a weighted bonus — it is a context-gated ±δ modifier. It adds (max +0.10) only when a relevant anchor is corroborated by strong context, penalises mismatch (−0.15) and over-optimised exact-match anchors in weak context (−0.10). Generic anchors are neutral. This mirrors Google's anchorMismatchDemotion — a penalty, never a free boost.
Model: all-MiniLM-L6-v2 (384-dim) via ONNX Runtime. Deterministic — same inputs always produce the same score.
Paste two URLs — we extract context, anchor, and body text automatically. No keywords needed. Or use text/weighted mode for manual control.
anchorMismatchDemotion). No manual keywords needed.
siteFocusScore), and scores against that instead of one thin page. Slower (multiple fetches); cached per domain.
-
-
-
Upload a CSV with URLs or text. Columns: url_a, url_b (fetches & extracts) or text_a, text_b (direct text).
Drop your CSV here or click to browse
Download text template · Download URL template
| # | Source | Target | Score | Tier |
|---|
Google's Reasonable Surfer patent (US 7,716,225 — filed 2004, updated 2010) assigns different weights to links based on their probability of being clicked. A link in a contextually relevant paragraph carries more weight than one in a footer or sidebar. Our formula models this: the context signals carry 100% of the base score (65% paragraph + 35% page); the anchor only adjusts it by a small signed δ.
The 2024 Google Content Warehouse API leak (documented by iPullRank, SparkToro) exposed real production ranking fields:
context2 — hash of terms near the anchor (paragraph-level context, NOT full page body)
fullLeftContext / fullRightContext — extended text window around the link
anchorMismatchDemotion — penalty when anchor topic doesn't match destination
sourceType — quality tier of the linking page (HIGH/MEDIUM/LOW)
siteFocusScore — how topically focused the target site is
siteRadius — how far individual pages deviate from the site's topic centroid
Thin-page enrichment: for pages with little on-page text (JS apps, streaming/SPAs), we also fold in what Google reads to understand a page's topic — title (titlematchScore), meta description, Open Graph, and JSON-LD structured-data entities — bounded and deduped so it never dominates a content-rich page.
Both levels count, complementarily. The paragraph around the link (context2, fullLeftContext/fullRightContext) is the strongest signal — so it carries the larger weight (65%). But the topicality of the whole referring page/site (siteFocusScore, siteRadius) still applies (35%): a 2,000-word marketing page with one relevant paragraph passes good local context, yet its low overall topic-focus discounts the link. Mathematically: score = 0.65·cos(context, target) + 0.35·cos(refPage, target) ± δanchor. The leak exposes that these fields exist, not the exact weights Google assigns — so "only the paragraph matters" overstates it; both are inputs.
The leak shows context2 is the primary relevance signal and siteFocusScore measures topical alignment — both are genuine semantic matches that belong in a cosine. Anchor text, however, is exposed only as anchorMismatchDemotion — a penalty, not a positive ranking term. So the base score is a true weighted cosine: context paragraph 65% + referring page 35%. The anchor then applies a small signed δ, gated on context:
Each text x → a 384-d embedding v(x) (mean-pooled, L2-normalised, so ∥v∥=1). Cosine = dot product:
cos(a,b) = (a·b) / (∥a∥∥b∥) = a·b (since ∥v∥=1)
Let c = cos(context, target), r = cos(refPage, target), a = cos(anchor, target).
base = 0.65·c + 0.35·r
δ = −0.15 if a < 0.15 (mismatch → anchorMismatchDemotion)
δ = min(0.10, 0.20·a) if c ≥ 0.40 (relevant anchor + strong context → adds)
δ = −0.10 if a > 0.45 and c < 0.30 (exact-match in weak context → over-optimised)
δ = 0 otherwise (or generic anchor)
score = clamp(base + δ, −1, +1)
Why context gates the anchor: a keyword-rich anchor like "buy gold online" always scores high a against a "buy gold" page — that overlap is tautological, not evidence of a good link. The discriminator is whether the surrounding paragraph (c) supports it. Strong c → the match is a natural descriptive link (small bonus). Weak c → the same match is an over-optimisation signal (penalty). The asymmetry (max penalty 0.15 > max bonus 0.10) reflects that Google's link system is risk-averse: a bad anchor hurts more than a good one helps.
384-dimensional sentence embeddings, 82.03 Spearman on STS Benchmark. We tested Nomic (768-dim) — it compressed all scores to 0.42-0.84, making tier differentiation impossible. MiniLM's wider spread maps naturally to meaningful quality tiers. ONNX runtime ensures deterministic FP32 output: same inputs, same score, every time.
Strong (≥0.45) requires genuine contextual alignment — not achievable by anchor match alone. Moderate (0.25–0.44) indicates topical connection with room to improve. Weak (0.10–0.24) means minimal overlap. Below 0.10 (incl. negative) flags an Irrelevant / Risk link — anchor mismatch or over-optimisation. Thresholds and the anchor constants (0.15 / 0.45 / 0.30 / 0.40 and the δ penalties) are starting estimates — pending calibration against labelled production links.
WLDM Cosine Scoring Pipeline ───────────────────────────── Step 1: Extract URL → fetch HTML → strip nav/footer → clean main body text + title / meta desc / OG / JSON-LD entities ← topic signal for thin / JS pages Step 2: Embed (chunk + centroid) Long docs → ~150-word chunks each → all-MiniLM-L6-v2 (ONNX) → 384-d → averaged into whole-doc centroid Step 3: Compare Cosine similarity = dot product of L2-normalized vectors Score range: 0.0 → 1.0 Step 4: Score (base cosine) base = 0.65 × cos(context ↔ target) ← context2 + 0.35 × cos(refPage ↔ target) ← siteFocus Step 5: Anchor δ (context-gated) a = cos(anchor ↔ target), c = cos(context ↔ target) a < 0.15 → δ = −0.15 mismatch c ≥ 0.40 → δ = +min(0.10, 0.20a) natural a > 0.45 & c < 0.30 → δ = −0.10 over-opt else / generic → δ = 0 score = clamp(base + δ, −1, +1) Step 6: Classify ● Strong ≥ 0.45 ● Moderate 0.25 – 0.44 ● Weak < 0.25 Deterministic Guarantee Same inputs → same ONNX graph → same FP32 result every time No randomness. No sampling.
Our free backlink audit scores every link, maps competitors, and identifies the gaps holding you back.
Book a Free Audit Call