Cosine Relevance Scorer

How It Works

Three signals. One score.

Google doesn't just count links — it evaluates if the linking page is contextually relevant to your content. We model this with three semantic signals.

Context ↔ Target Page

The paragraph surrounding your backlink is compared against the target page body. This is the strongest signal — worth 65% of the base score. Models Google's context2 — the hash of terms near the anchor.

Referring Page ↔ Target Page

The full referring page topic is compared to the target page. Captures topical alignment even when the specific paragraph is weak. Worth 35% — models Google's siteFocusScore.

Anchor ↔ Target Page

The anchor is not a weighted bonus — it is a context-gated ±δ modifier. It adds (max +0.10) only when a relevant anchor is corroborated by strong context, penalises mismatch (−0.15) and over-optimised exact-match anchors in weak context (−0.10). Generic anchors are neutral. This mirrors Google's anchorMismatchDemotion — a penalty, never a free boost.

0.65 × cos(context, target) + 0.35 × cos(refPage, target) ± δ_anchor ∈ [−0.15, +0.10]

Strong ≥ 0.45

Moderate 0.25 – 0.44

Weak 0.10 – 0.24

Irrelevant / Risk < 0.10

Model: all-MiniLM-L6-v2 (384-dim) via ONNX Runtime. Deterministic — same inputs always produce the same score.

Calculator

Score your backlinks

Paste two URLs — we extract context, anchor, and body text automatically. No keywords needed. Or use text/weighted mode for manual control.

Referring Page URL

The page that links to you — we'll find the link and extract anchor + surrounding paragraph

Detected Link

Target Page URL

Your page — we'll extract the body content

Extracted Body

Automatic 3-signal analysis

Fetch & Score: link context paragraph (65%) + referring page topical relevance (35%), with anchor applied as a signed ±δ (Google’s anchorMismatchDemotion). No manual keywords needed.

Site Focus mode — use for thin / JS-heavy target sites (streaming, casino, SPAs) whose homepage has little text. It crawls up to 8 pages of the target site (sitemap → internal links), builds a topic centroid (Google’s siteFocusScore), and scores against that instead of one thin page. Slower (multiple fetches); cached per domain.

-

Text A

e.g. the paragraph surrounding your backlink

Text B

e.g. your target page body or keyword cluster

-

Referring Page Context

The surrounding paragraph - the most important signal (50%)

Target Page Body

Your page's main content - what the link points to

Anchor Text

The clickable link text (20%)

Target Keywords

Comma-separated keyword cluster — context↔keywords is 30%

-

Upload a CSV with URLs or text. Columns: url_a, url_b (fetches & extracts) or text_a, text_b (direct text).

📄

Drop your CSV here or click to browse

Accepts .csv - max 200 rows for URL mode, 500 for text mode

Download text template · Download URL template

Processing 0 / 0

#	Source	Target	Score	Tier

Methodology

Why cosine similarity matters for backlinks

The Reasonable Surfer Model

Google's Reasonable Surfer patent (US 7,716,225 — filed 2004, updated 2010) assigns different weights to links based on their probability of being clicked. A link in a contextually relevant paragraph carries more weight than one in a footer or sidebar. Our formula models this: the context signals carry 100% of the base score (65% paragraph + 35% page); the anchor only adjusts it by a small signed δ.

Confirmed by the 2024 API Leak

The 2024 Google Content Warehouse API leak (documented by iPullRank, SparkToro) exposed real production ranking fields:

context2 — hash of terms near the anchor (paragraph-level context, NOT full page body)
fullLeftContext / fullRightContext — extended text window around the link
anchorMismatchDemotion — penalty when anchor topic doesn't match destination
sourceType — quality tier of the linking page (HIGH/MEDIUM/LOW)
siteFocusScore — how topically focused the target site is
siteRadius — how far individual pages deviate from the site's topic centroid

Thin-page enrichment: for pages with little on-page text (JS apps, streaming/SPAs), we also fold in what Google reads to understand a page's topic — title (titlematchScore), meta description, Open Graph, and JSON-LD structured-data entities — bounded and deduped so it never dominates a content-rich page.

Both levels count, complementarily. The paragraph around the link (context2, fullLeftContext/fullRightContext) is the strongest signal — so it carries the larger weight (65%). But the topicality of the whole referring page/site (siteFocusScore, siteRadius) still applies (35%): a 2,000-word marketing page with one relevant paragraph passes good local context, yet its low overall topic-focus discounts the link. Mathematically: score = 0.65·cos(context, target) + 0.35·cos(refPage, target) ± δ_anchor. The leak exposes that these fields exist, not the exact weights Google assigns — so "only the paragraph matters" overstates it; both are inputs.

Why 65/35 + anchor demotion?

The leak shows context2 is the primary relevance signal and siteFocusScore measures topical alignment — both are genuine semantic matches that belong in a cosine. Anchor text, however, is exposed only as anchorMismatchDemotion — a penalty, not a positive ranking term. So the base score is a true weighted cosine: context paragraph 65% + referring page 35%. The anchor then applies a small signed δ, gated on context:

The math, precisely

Each text x → a 384-d embedding v(x) (mean-pooled, L2-normalised, so ∥v∥=1). Cosine = dot product:

cos(a,b) = (a·b) / (∥a∥∥b∥) = a·b (since ∥v∥=1)

Let c = cos(context, target), r = cos(refPage, target), a = cos(anchor, target).

base = 0.65·c + 0.35·r

δ = −0.15 if a < 0.15  (mismatch → anchorMismatchDemotion)
δ = min(0.10, 0.20·a) if c ≥ 0.40  (relevant anchor + strong context → adds)
δ = −0.10 if a > 0.45 and c < 0.30  (exact-match in weak context → over-optimised)
δ = 0 otherwise (or generic anchor)

score = clamp(base + δ, −1, +1)

Why context gates the anchor: a keyword-rich anchor like "buy gold online" always scores high a against a "buy gold" page — that overlap is tautological, not evidence of a good link. The discriminator is whether the surrounding paragraph (c) supports it. Strong c → the match is a natural descriptive link (small bonus). Weak c → the same match is an over-optimisation signal (penalty). The asymmetry (max penalty 0.15 > max bonus 0.10) reflects that Google's link system is risk-averse: a bad anchor hurts more than a good one helps.

Why all-MiniLM-L6-v2?

384-dimensional sentence embeddings, 82.03 Spearman on STS Benchmark. We tested Nomic (768-dim) — it compressed all scores to 0.42-0.84, making tier differentiation impossible. MiniLM's wider spread maps naturally to meaningful quality tiers. ONNX runtime ensures deterministic FP32 output: same inputs, same score, every time.

Threshold Calibration

Strong (≥0.45) requires genuine contextual alignment — not achievable by anchor match alone. Moderate (0.25–0.44) indicates topical connection with room to improve. Weak (0.10–0.24) means minimal overlap. Below 0.10 (incl. negative) flags an Irrelevant / Risk link — anchor mismatch or over-optimisation. Thresholds and the anchor constants (0.15 / 0.45 / 0.30 / 0.40 and the δ penalties) are starting estimates — pending calibration against labelled production links.

WLDM Cosine Scoring Pipeline
─────────────────────────────

Step 1: Extract
  URL → fetch HTML → strip nav/footer
  → clean main body text
  + title / meta desc / OG / JSON-LD entities
    ← topic signal for thin / JS pages

Step 2: Embed (chunk + centroid)
  Long docs → ~150-word chunks
  each → all-MiniLM-L6-v2 (ONNX) → 384-d
  → averaged into whole-doc centroid

Step 3: Compare
  Cosine similarity = dot product
  of L2-normalized vectors

  Score range: 0.0 → 1.0

Step 4: Score (base cosine)
  base = 0.65 × cos(context ↔ target)   ← context2
       + 0.35 × cos(refPage ↔ target)   ← siteFocus

Step 5: Anchor δ (context-gated)
  a = cos(anchor ↔ target),  c = cos(context ↔ target)
  a < 0.15             → δ = −0.15  mismatch
  c ≥ 0.40             → δ = +min(0.10, 0.20a)  natural
  a > 0.45 & c < 0.30  → δ = −0.10  over-opt
  else / generic       → δ = 0
  score = clamp(base + δ, −1, +1)

Step 6: Classify
  ● Strong   ≥ 0.45
  ● Moderate 0.25 – 0.44
  ● Weak     < 0.25

Deterministic Guarantee
  Same inputs → same ONNX graph
  → same FP32 result every time
  No randomness. No sampling.

Backlink Relevance
Cosine Scorer

Context ↔ Target Page

Referring Page ↔ Target Page

Anchor ↔ Target Page

Detected Link

Extracted Body

-

-

-

The Reasonable Surfer Model

Confirmed by the 2024 API Leak

Why 65/35 + anchor demotion?

The math, precisely

Why all-MiniLM-L6-v2?

Threshold Calibration

Want the full picture?

Backlink RelevanceCosine Scorer

Context ↔ Target Page

Referring Page ↔ Target Page

Anchor ↔ Target Page

Detected Link

Extracted Body

-

-

-

The Reasonable Surfer Model

Confirmed by the 2024 API Leak

Why 65/35 + anchor demotion?

The math, precisely

Why all-MiniLM-L6-v2?

Threshold Calibration

Want the full picture?

Backlink Relevance
Cosine Scorer