How Is Visitor Identification Accuracy Measured?

Most people evaluating visitor identification tools conflate three different metrics into one number, then wonder why the vendor claims don’t match what they see in the CRM. Match rate, accuracy, and completeness are not the same thing, and you need all three to make a real decision.

I am George, founder of Leadpipe. We scored 8.7/10 in a third-party accuracy audit against RB2B (5.2) and Warmly (4.0). This post is the methodology behind that test, the failure modes it catches, and how to run the same evaluation on your own traffic without trusting any vendor’s word, including ours.

The three metrics buyers confuse

Ask a vendor “what’s your accuracy” and you’ll get a number. Ask them how they measured it and the answer falls apart. Here’s the clean breakdown.

Metric	What it measures	How it fails you
Match rate	Percentage of unique visitors the tool returned a record for	Inflates easily with probabilistic matching and session-level counting
Accuracy	Percentage of returned records that are correct for the actual visitor	Requires ground truth to verify, so nobody publishes it honestly
Completeness	Percentage of expected fields that are populated per record	A record with only a company name is technically a “match” but useless

A tool claiming “35% match rate” might mean:

35% of unique visitors matched at person level with verified data (the honest version).
35% of sessions matched at any level including company only (the inflated version).
35% of visitors were guessed at via IP-to-company lookup (the probabilistic version).

Three tools can all claim “35% match rate” and mean three entirely different things. This is why the category is so hard to shop for without running the test yourself.

How to measure match rate honestly

The honest definition is: the percentage of unique US-based B2B visitors for whom the tool returned a person-level record with contact data, over a fixed time window, on a fixed set of pixel installations.

Each phrase does work:

Unique visitors. Count people, not sessions. Counting sessions inflates the number because returning visitors count multiple times.
US-based. Match rates vary wildly by geography. Mixing in international traffic makes the number incomparable.
B2B. Office IPs, business emails, and corporate networks match at higher rates than residential mobile.
Person-level record. A company name without a person is not a match at the person level. The difference matters.
Contact data. A record with a name but no email or phone is barely a lead.
Fixed time window. One week is not long enough to smooth out daily variance. Measure for at least two weeks.
Fixed pixel installations. Traffic composition drives match rate. Comparing two tools on two different sites is comparing two different things.

At Leadpipe, our match rate on US B2B traffic is 30-40%+ measured this way. See the full methodology discussion for the long version.

How to measure accuracy honestly

Accuracy is harder because you need ground truth. You need to know who actually visited, so you can check whether the tool named the right person.

Two methods work in practice:

Method 1: Known-identity seeding

Have real people with known identities visit the site under controlled conditions. Capture the tool’s returned record. Compare.

Strengths: clean ground truth, easy to interpret.
Weaknesses: small sample, not representative of the full traffic mix.

Method 2: Manual verification against external sources

Take a random sample of the tool’s returned records. For each one, verify against LinkedIn, the company website, and public directory data. Score each record on identity match, employer match, title match, and contact completeness.

Strengths: large, representative sample of real traffic.
Weaknesses: depends on the quality of the verification sources, which are themselves imperfect.

The independent accuracy test used method 2, with a Gartner-experienced data quality analyst verifying a sample of records across Leadpipe, RB2B, and Warmly. The results:

Accuracy score (independent test, 75,000 visitors, 120 days):
Leadpipe   ████████████████████   8.7/10
RB2B       ███████████            5.2/10
Warmly     ████████               4.0/10

The 8.7 does not mean “87% of everyone who visited was identified.” It means: of the visitors Leadpipe returned records for, 87% were scored as accurate on identity, employer, and contact fields.

Failure modes the test catches

Accuracy audits catch specific kinds of broken that a match-rate-only comparison hides. The big ones:

Wrong person, right company

A tool returns “Acme Corp” correctly but names Sarah Chen when it was actually Marcus Lee. This happens when the tool resolves to the company’s most-seen employee or to a company-wide default, and labels the match as person-level. It looks like a match. It’s a hallucination.

Stale employer

The tool names the visitor correctly but shows their employer from 18 months ago. They’ve changed jobs. Your outbound goes to a defunct email and damages your domain reputation.

Name-and-company without contact

A “match” with a name and a company but no email or phone. Your rep has nothing to act on. The record fills a row in the CRM and nothing else.

Probabilistic inflation

The tool returns a match with 52% confidence. It looks like a match on the dashboard. The data is a coin flip. Deterministic vs probabilistic matching covers the architectural distinction.

Duplicate records

Same person identified twice with slightly different spellings or stale employers, each counted as a separate match. Inflates the match rate, degrades the CRM.

Each of these failure modes degrades your pipeline in a different way, and none of them show up if you only look at a match-rate number.

The independent test, in plain terms

The independent accuracy test we cite is a third-party audit. The design:

75,000 unique visitors across a 120-day window.
Three tools running simultaneously on the same pixel installation, so traffic composition is held constant.
Manual verification of a random sample of returned records against LinkedIn, company websites, and contact directories.
Scoring on a 0-10 scale across identity match, employer match, title match, and contact completeness.

The results showed:

Tool	Score	Matching approach
Leadpipe	8.7/10	Deterministic, own identity graph
RB2B	5.2/10	Probabilistic, LinkedIn-only
Warmly	4.0/10	Probabilistic

The spread between 8.7 and 4.0 is not a small difference. It’s the difference between outbound that works and outbound that damages your domain. The full breakdown is in the test results post.

How to run the test on your own traffic

You do not need a Gartner-trained analyst. You need a spreadsheet and one hour.

Install two or three tools on the same site. Start with free trials. Most tools offer 500 leads or 14 days.
Let them run for at least two weeks. Daily variance is real. A week is not enough.
Export the returned records from each tool. Get name, email, company, title, date identified.
Pick a random sample of 50 records per tool. Not the first 50. A real random sample. A spreadsheet RAND() sort is fine.
Verify each record. For each sampled record, check:
- Does this person exist on LinkedIn?
- Do they currently work at the named company?
- Does the title roughly match their LinkedIn title?
- Is the email format consistent with the company’s email pattern?
Score. Mark each record pass or fail on each dimension. Accuracy = pass rate.
Compare. The tool with the higher accuracy score wins, even if it has a lower match rate.

The math on why accuracy beats match rate:

Tool A: 30% match rate × 90% accuracy = 27% usable leads
Tool B: 40% match rate × 50% accuracy = 20% usable leads

Tool A wins, even though its headline number is lower. The cost of anonymous traffic is high enough that wrong matches waste more than they earn.

What we measure internally at Leadpipe

Internally, we run four metrics on the graph every day.

Metric	What it tracks	Why
Match rate	% of US B2B unique visitors returned at person level	Category-standard headline metric
Record accuracy	% of returned records verified correct on a rolling sample	Keeps us honest on deterministic matching
Field fill rate	% of records with email, phone, LinkedIn, title populated	Catches “matched but empty” regressions
Freshness	% of returned records last verified in the past 24 hours	Detects graph decay early

None of these is “the” metric. All four together tell you whether the graph is healthy. A graph with great match rate and collapsing freshness will start producing outdated records in weeks.

What this means for customers

When you pick a visitor identification tool, don’t pick on match rate alone. Run the three-metric test. Match rate tells you how many records you get. Accuracy tells you how many of those records are right. Completeness tells you whether each record is actionable.

Leadpipe is built to score well on all three. 30-40%+ match rates, 8.7/10 in the independent accuracy test, and 100+ data points per record including business and personal email, phone, LinkedIn, firmographics, and behavioral signals. Same graph, same data, on every plan from $147/mo Pro to enterprise.

Every plan ships with the same identity graph, 23 REST endpoints, webhooks, and a 27-tool MCP server. Start in 5 minutes