Every visitor identification vendor on the planet loves talking about match rates. “We identify 40% of your visitors!” “Our identity graph covers 250 million profiles!” Big numbers. Impressive slides.
But here’s the question nobody wants you to ask: how many of those identifications are actually correct?
A 40% match rate means nothing if half those matches are the wrong person at the wrong company. You’re not just getting zero value from bad data - you’re actively damaging your pipeline. Wrong names in outreach emails. Wrong companies in your CRM. Wrong contacts fed to your AI SDR.
We wanted real numbers. So we commissioned an independent Gartner-certified auditor to test the six most popular visitor identification tools against known visitors - people whose identities we already confirmed. The results were revealing, uncomfortable for some vendors, and critical for anyone relying on visitor data to drive revenue.
Why Accuracy Matters More Than Match Rate
Let’s make this concrete.
Match rate = the percentage of website visitors a tool identifies.
Accuracy = the percentage of those identifications that are actually correct.
These are two completely different metrics, and most vendors only talk about the first one.
| Scenario | Match Rate | Accuracy | Correct IDs per 10K Visitors |
|---|---|---|---|
| Tool A | 40% | 80% | 3,200 |
| Tool B | 60% | 40% | 2,400 |
| Tool C | 30% | 90% | 2,700 |
Tool B has the highest match rate - and delivers the fewest correct identifications. Tool A gives you 33% more usable data despite identifying fewer visitors overall.
The takeaway: A tool that identifies fewer visitors but gets them right will always outperform a tool that identifies more visitors but gets them wrong. Accuracy is the multiplier. Match rate is just the input.
This matters even more when you’re feeding data into automated systems. If an AI sales agent sends a hyper-personalized email to the wrong person, you don’t just waste a send - you burn trust and potentially blacklist your domain.
Testing Methodology
We didn’t run a casual test. Here’s exactly what the independent auditor did.
The Auditor
A Gartner-certified data quality auditor with no financial relationship to any of the tools tested. They designed the methodology, executed the tests, and delivered the findings independently.
The Setup
- Controlled visitor pool - 500 known individuals across various industries, company sizes, and job functions. Real people, verified identities.
- Natural browsing conditions - Test visitors browsed from their normal devices, normal networks, normal browsers. No VPNs, no incognito. Just regular browsing behavior.
- Simultaneous tracking - All six tools ran on the same test sites at the same time. Every tool saw the same traffic.
- Blind evaluation - The auditor compared each tool’s output against the verified visitor list without knowing which tool produced which result until scoring was complete.
Scoring Criteria (10-Point Scale)
Each tool was evaluated on four dimensions:
| Criterion | Weight | What It Measures |
|---|---|---|
| Correct Person ID | 30% | Did the tool identify the right individual? |
| Correct Company | 25% | Was the company association accurate? |
| Contact Info Validity | 25% | Are the email addresses deliverable? Are phone numbers active? |
| Contact Relevance | 20% | Does the identified person actually work there and hold a relevant role? |
The overall score is a weighted composite. Let’s see how each tool performed.
The Results
Here’s the full comparison table from the independent audit.
| Tool | Overall Score | Correct ID Rate | Contact Relevance | Matching Method |
|---|---|---|---|---|
| Leadpipe | 8.7/10 | 82% | High (8.5/10) | Deterministic |
| 6sense | 6.5/10 | ~65% | Moderate (6/10) | Probabilistic + ML |
| Leadfeeder | 6.2/10 | ~62% | Moderate (5.5/10) | IP-based |
| Clearbit | 5.8/10 | ~58% | Moderate (5/10) | Probabilistic |
| RB2B | 5.2/10 | ~52% | Low (4/10) | Probabilistic |
| Warmly | 4.0/10 | ~40% | Very Low (3/10) | Probabilistic |
Key finding from the auditor: “Deterministic matching produced significantly fewer false positives than probabilistic approaches. The gap in contact relevance was the most pronounced difference - tools using deterministic methods returned contacts that were verifiably associated with the visiting organization, while probabilistic tools frequently returned individuals with no clear connection to the visit.”
Let’s break down each tool.
Tool-by-Tool Breakdown
Leadpipe - 8.7/10
Correct ID rate: 82% | Contact relevance: High (8.5/10)
Leadpipe’s deterministic matching approach stood out immediately in the audit. Of all visitors identified by Leadpipe, 82% were confirmed as the correct person at the correct company. That’s not match rate - that’s accuracy among matched visitors.
The auditor noted three specific strengths:
- Lowest false positive rate across all tools tested
- Contact info validity scored highest - email addresses were deliverable, phone numbers were active and current
- Contact relevance was the biggest differentiator: identified individuals actually held roles relevant to buying decisions
Leadpipe builds its own identity graph rather than reselling third-party data, which the auditor cited as a likely factor in data freshness and accuracy. The deterministic approach means Leadpipe only returns an identification when it has a verified data match - not a statistical guess.
Auditor note: “Leadpipe’s approach sacrifices some match volume for substantially higher confidence per match. For teams that act on identified visitors - especially through automated outreach - this tradeoff favors Leadpipe significantly.”
6sense - 6.5/10
Correct ID rate: ~65% | Contact relevance: Moderate (6/10)
6sense performed well at the company level. Its machine learning models correctly associated visits with the right company roughly 78% of the time - a solid result.
The problem? Person-level identification was inconsistent.
When 6sense identified a specific individual, that identification was correct only about 65% of the time. The remaining 35% included:
- Right company, wrong person (most common)
- Right industry, wrong company entirely
- Outdated contact information (person had changed roles)
6sense is built for account-based marketing, and it shows. If you’re running ABM campaigns where company-level data is sufficient, it’s a reasonable choice. But if you need to know who specifically visited your site - for sales outreach or AI SDR workflows - the person-level accuracy gap is a real issue at enterprise pricing.
Leadfeeder - 6.2/10
Correct ID rate: ~62% | Contact relevance: Moderate (5.5/10)
Leadfeeder takes a fundamentally different approach: IP-based company identification. It doesn’t try to identify individual people - it tells you which companies are visiting your site.
At what it does, it’s decent. Company identification accuracy was around 62%, which is reasonable for IP resolution. The challenges:
- No person-level data - You know “someone from Acme Corp visited.” You don’t know who.
- Remote work blind spot - When employees browse from home (which is most of the time in 2026), IP resolution fails. The auditor flagged this as a growing problem.
- Shared IP confusion - Co-working spaces, large ISPs, and VPNs all create false associations.
Leadfeeder is honest about what it provides - company-level identification. But in a world where sales teams need names, emails, and phone numbers, company-level data is a starting point, not a solution.
Clearbit - 5.8/10
Correct ID rate: ~58% | Contact relevance: Moderate (5/10)
Clearbit has long been respected for data enrichment, and its company-level data remains solid. But since the HubSpot acquisition, the visitor identification capabilities have shifted.
The audit found:
- Company identification was accurate roughly 70% of the time - on par with 6sense
- Person-level matching was limited, with the auditor noting that Clearbit’s strength lies in enriching known contacts rather than identifying unknown visitors
- Contact data quality was good when present, but coverage gaps meant many visitors returned no person-level match at all
If you’re already in the HubSpot ecosystem and primarily need company-level enrichment, Clearbit does that job. For standalone visitor identification at the person level, it’s no longer a top-tier option.
RB2B - 5.2/10
Correct ID rate: ~52% | Contact relevance: Low (4/10)
RB2B generated a lot of buzz when it launched. The match rate claims were aggressive, and the free tier brought in a flood of users. But the accuracy story is more concerning.
The auditor’s finding was blunt: RB2B “consistently identified irrelevant contacts.”
Here’s what that means in practice:
- 52% correct ID rate - Nearly half of all identifications were wrong
- Contact relevance scored 4/10 - Even when the right company was identified, the specific contact returned was often a junior employee or someone in an unrelated department
- LinkedIn dependency - RB2B’s probabilistic matching relies heavily on LinkedIn profile data. If a visitor doesn’t have a LinkedIn profile, they simply won’t be identified. The auditor noted this creates systematic gaps in coverage.
The RB2B alternatives post covers this in more detail, but the core issue is straightforward: a tool that’s wrong almost half the time creates more work than it saves. Your sales team spends time chasing bad leads instead of closing real ones.
From the audit: “RB2B’s probabilistic approach produced the widest variance in accuracy across test segments. B2B tech visitors were identified with reasonable accuracy, but visitors from other industries showed significantly higher error rates.”
Warmly - 4.0/10
Correct ID rate: ~40% | Contact relevance: Very Low (3/10)
Warmly scored lowest in the audit, and the auditor’s language was notably direct: the tool “returned entirely wrong individuals from unrelated companies” at the highest rate of any tool tested.
The breakdown:
- 40% correct ID rate - 6 out of 10 identifications were wrong
- Highest false positive rate - Warmly confidently returned data that was completely incorrect more often than any other tool
- Contact relevance of 3/10 - The identified contacts frequently had no connection to the visiting organization
Warmly positions itself as a real-time engagement platform with chat, video, and meeting scheduling. The visitor identification is one component of a broader feature set. But if the identification layer is producing data this unreliable, every downstream feature that depends on it - personalized chat greetings, automated meeting booking, intent scoring - is built on a shaky foundation.
At Warmly’s pricing ($900+/mo for their Data Agent plan), the accuracy gap is difficult to justify.
What “Accuracy” Actually Means in Practice
Let’s define the four dimensions the auditor measured, because they matter in different ways for different workflows.
1. Correct Person Identification
Did the tool identify the right human being? Not “someone who might work at a similar company” or “a person with a similar name.” The actual individual who visited your site.
This is the most fundamental test. If you get the person wrong, nothing else matters.
2. Correct Company Association
Is the visitor actually employed at the company the tool says they work for? This sounds basic, but probabilistic tools often associate visitors with companies based on IP ranges, cookie data, or behavioral patterns - all of which can produce wrong matches.
3. Contact Info Validity
Are the contact details usable?
- Is the email address deliverable (not bounced, not a catch-all)?
- Is the phone number active and current?
- Is the LinkedIn profile the right person (not someone with the same name)?
Stale data is almost as bad as wrong data. An email that bounces still hurts your sender reputation.
4. Contact Relevance
This is the dimension where tools diverged most dramatically. Contact relevance asks: does the identified person actually make sense as a lead?
A tool might correctly identify that someone from Acme Corp visited your page. But if it returns a receptionist when the actual visitor was the VP of Engineering, the data is technically “correct” at the company level but practically useless for sales outreach.
Leadpipe scored 8.5/10 on contact relevance. Warmly scored 3/10. That gap is the difference between your AI SDR reaching the right person and your AI SDR embarrassing your brand.
The Downstream Cost of Inaccuracy
Bad visitor identification data doesn’t just waste money. It actively damages your business in ways that compound over time.
Try Leadpipe free with 500 leads →
Domain Reputation Destruction
When your AI SDR or outbound system sends personalized emails to wrong contacts, those emails get ignored, marked as spam, or bounce. Each one chips away at your sender reputation. Enough bad sends and your emails start landing in spam for everyone - including legitimate prospects.
Wasted Sales Capacity
Your SDR calls the “identified” contact. Wrong person. Wrong company. That’s not just a wasted call - it’s an awkward conversation that reflects poorly on your brand. Multiply that across dozens of bad IDs per day, and you’re burning serious sales capacity.
CRM Data Pollution
Wrong visitor data flows into your CRM, where it corrupts every downstream metric:
- Lead scoring becomes unreliable (garbage in, garbage out)
- Attribution models assign credit to the wrong sources
- Forecasting breaks because pipeline is built on phantom opportunities
- Segmentation targets the wrong accounts
Once bad data enters your CRM, it’s extremely expensive to clean up. Most teams don’t even realize it’s happening until pipeline metrics stop making sense.
Wasted Ad Spend
Retargeting “identified visitors” who were misidentified means you’re serving ads to people who never visited your site. You’re paying for impressions and clicks from people who have zero interest in your product. At $15-50 CPM for B2B retargeting, a 40% misidentification rate means 40% of your retargeting budget is going straight into the garbage.
The Numbers Add Up Fast
| Impact Area | 50% Accuracy Tool | 80% Accuracy Tool |
|---|---|---|
| Correct leads (per 1,000 IDs) | 500 | 800 |
| Wasted SDR hours/month | ~40 hrs | ~15 hrs |
| Bounced emails (est.) | 15-25% | 3-7% |
| CRM cleanup cost/quarter | $2,000-5,000 | $200-500 |
| Retargeting waste | ~40% of spend | ~15% of spend |
The Compound Effect
Each of these costs is significant on its own. Together, they create a compounding problem: bad data leads to bad decisions, which lead to worse outcomes, which make the data look even less reliable, which erodes trust in the entire system.
Accuracy isn’t a nice-to-have. It’s the foundation everything else is built on.
Deterministic vs. Probabilistic Matching: Why the Gap Exists
The accuracy gap between tools isn’t random. It maps directly to their underlying matching methodology.
Deterministic Matching
Deterministic matching requires a verified data point to make an identification. The tool has specific evidence - a known device fingerprint, a confirmed email-to-device association, a verified login event - before it tells you who visited.
The tradeoff: deterministic matching may identify fewer visitors overall. But when it does identify someone, the confidence level is high.
Probabilistic Matching
Probabilistic matching uses statistical inference to guess who visited. The tool looks at IP address patterns, browser behavior, cookie data, LinkedIn activity, and other signals, then makes its best guess at who the visitor might be.
The tradeoff: probabilistic matching casts a wider net and may report higher match rates. But the guesses are often wrong, particularly for visitors outside the tool’s core data segments.
What the Auditor Found
The audit results mapped cleanly to this distinction. The auditor’s conclusion:
“Tools employing deterministic matching methods produced significantly fewer false positives. The accuracy difference was most pronounced in contact relevance - deterministic tools returned verifiably relevant contacts, while probabilistic tools frequently returned contacts with no demonstrable connection to the visiting individual or organization.”
For a deeper dive into the technical differences, see our complete guide to deterministic vs. probabilistic matching.
How to Run Your Own Accuracy Test
Don’t take our word for it. You can validate any visitor identification tool’s accuracy yourself. Here’s a straightforward testing protocol.
Step 1: Build Your Known Visitor List
Create a list of 50-100 people whose identities you can confirm:
- Employees at your company
- Friends and colleagues at other companies
- Partners or vendors you work with
Record their full name, company, email address, and job title. This is your ground truth.
Step 2: Have Them Visit Your Site
Ask each person to visit your website from their normal device on their normal network. No VPNs, no incognito mode, no shared computers. Just regular browsing.
Have each person visit at least 2-3 pages and spend at least 30 seconds on the site. This gives every tool a fair chance at identification.
Step 3: Collect Tool Output
Wait 24-48 hours (some tools batch their matching), then export the identified visitors from each tool you’re testing. Make sure to capture:
- Name returned
- Company returned
- Email/phone returned
- Job title returned
- Confidence score (if available)
Step 4: Calculate Accuracy
Compare each tool’s output against your known visitor list.
| Metric | Formula |
|---|---|
| Correct ID Rate | Correct identifications / Total identifications |
| Coverage Rate | Total identifications / Known visitors who browsed |
| Contact Accuracy | Valid contact info / Total contacts returned |
| Relevance Score | Relevant contacts / Total contacts returned |
Step 5: Validate Contact Info
For each returned contact:
- Email: Run through a verification service (NeverBounce, ZeroBounce). Is it deliverable?
- Phone: Call or text. Is the number active? Does it reach the right person?
- LinkedIn: Check the profile. Is this the same person? Do they still work at that company?
This step catches a sneaky failure mode: tools that return “correct” names but with outdated or wrong contact details. A correct name with a dead email address is still a bad lead.
What to Look For
If a tool identifies someone who wasn’t in your known visitor list, that’s a false positive. If it misidentifies someone who was in your list, that’s a misattribution. Both matter. Track them separately.
Most teams who run this test find significant gaps between what vendors claim and what actually happens with real traffic. That gap is exactly why we commissioned the independent audit.
Frequently Asked Questions
How often should you test visitor identification accuracy?
At minimum, quarterly. Identity graphs change, matching algorithms update, and data sources shift. A tool that was accurate six months ago may have degraded - or improved. Regular testing catches drift before it corrupts your pipeline.
Do accuracy rates vary by industry or traffic source?
Yes, significantly. Most visitor identification tools perform best with B2B tech traffic and US-based visitors. If your audience skews international, non-tech, or heavily mobile, expect lower accuracy across the board. The audit found that accuracy gaps between tools widened for non-tech traffic segments.
Can you use multiple visitor identification tools simultaneously?
You can, and some teams do. The idea is to cross-reference identifications - if two tools agree on who visited, your confidence goes up. The downside: cost, complexity, and the need to build deduplication logic. For most teams, choosing one high-accuracy tool is simpler and more cost-effective than stitching together multiple mediocre ones.
What’s an acceptable accuracy rate for visitor identification?
It depends on your workflow. If you’re feeding data into automated outreach (AI SDRs, email sequences), you need 75%+ accuracy to avoid domain reputation damage. If you’re using data for manual sales research, you can tolerate lower accuracy because a human is validating before taking action. But below 50% accuracy, you’re essentially flipping a coin - and that’s not a data-driven strategy.
Stop Guessing. Test It Yourself.
Match rates get all the attention. Accuracy determines whether that attention translates into pipeline or problems.
The independent audit showed a clear hierarchy: deterministic matching consistently outperformed probabilistic approaches on every accuracy metric. Leadpipe’s 82% correct identification rate and 8.5/10 contact relevance score weren’t close - the next best tool scored 65% and 6/10.
But you don’t have to take an auditor’s word for it. Run the test yourself with the methodology above. See what your current tool actually delivers versus what it claims.
Get 500 free leads with Leadpipe - no credit card required. See the accuracy difference with your own traffic, your own visitors, your own data.
Related Articles
- Deterministic vs. Probabilistic Matching Explained
- Visitor Identification API: Complete Developer Guide
- Top 10 Visitor Identification Software in 2026
- RB2B Alternatives: Better Options for 2026
- The Data Layer AI Sales Agents Are Missing
- How to Choose a Data Provider for Your AI SDR
- The True Cost of Anonymous Website Traffic