I get the same question roughly once a week from prospects already running RB2B: “If I switch to Leadpipe, what actually changes?” The honest answer is, it depends on your traffic shape, your ICP, and what you do with the leads after identification. But “it depends” is a cop out, so let me give you the framework instead: how to run your own 90-day A/B, exactly what to measure, and what the verified accuracy data says you should expect.
I am George, founder of Leadpipe. This is the test design I would run if I were sitting in your seat.
The short version
A 90-day A/B is the cleanest way to settle this, and it is not as hard to set up as it sounds. Both pixels can run on the same site simultaneously. The interesting comparison is not which tool returns the bigger top-line number. It is which tool returns identifications you can actually use, and which set of identifications produces the larger downstream pipeline.
The verified anchor for the comparison: an independent Gartner-certified audit of 75,000 visitors over 120 days scored Leadpipe 8.7/10, RB2B 5.2/10, Warmly 4.0/10. The accuracy gap is the biggest determinant of which tool wins your test downstream.
How to set up the test
This is the design that produces honest data. Skipping any of these steps invites confounders.
The pixels
Install both pixels on every page of your site. They each run their own identity logic against the same browsers. You do not need to reconcile them. At the end of the window, pull both datasets and compare.
The window
90 days minimum. 60 days is too short to see late-arriving conversions; 120+ is better but most teams do not have the patience. Lock the window before you start.
The cohort
US B2B visitors only. Both tools are strongest on US B2B. Including international traffic adds noise and is unfair to both, since neither product is the right fit for EU/UK person-level identification under GDPR (Leadpipe defaults to company-level there; RB2B has its own constraints).
The downstream sequence
The most important design choice. Send identical outreach to both cohorts. Same template, same sender domain, same days. The only thing that differs is the source list. If you change the message between the two cohorts, you are testing copy, not tools.
What you log on day 0
Write down before starting:
- Total US B2B unique visitor count baseline (from your analytics tool)
- Cost of each plan over the 90 days
- Sender domain reputation snapshot (to check for drift)
- The exact template you will use downstream
- The pipeline attribution rule (we recommend 60-day attribution from first identified visit)
What to measure
Six things. Write them down before starting so you do not move the goalposts.
1. Match rate
Identified contacts divided by unique US B2B visitors. This is the input metric. It is not the most important metric, but it caps everything else.
2. Unique people identified
Some tools identify the same person multiple times. Dedupe before counting. The honest number is unique identified humans.
3. Overlap
The set of people identified by both tools. The size of the overlap tells you how much of your traffic is “easy to identify” versus how much each tool finds independently. The disagreement set inside the overlap (where both tools return a name but a different name) is the most interesting cell in the whole test.
4. Data completeness
For each identification, did the tool return a usable email, a phone, a title, a LinkedIn URL? An identification with no email and no phone is much harder to act on than one with both.
5. Downstream reply rate
Send the identical sequence to a randomized split of each cohort (RB2B-only, Leadpipe-only, overlap). Measure reply rate, positive reply rate, bounce rate, and unsubscribe rate.
6. Pipeline generated
Opportunities attributable to the identified visitor inside 60 days of the visit. This is the only metric that matters to a CRO.
| Layer | Metric | Why it matters |
|---|---|---|
| Top of funnel | Match rate | Caps the size of every downstream cohort |
| Top of funnel | Unique people identified | Honest unit of measurement |
| Mid funnel | Overlap and disagreement | Reveals identity graph quality |
| Mid funnel | Data completeness | Determines actionability |
| Bottom of funnel | Reply rate, bounce rate | Honest downstream signal |
| Bottom of funnel | Pipeline generated | The CRO number |
What the two tools are doing under the hood
Useful context before you run the test.
RB2B
RB2B uses LinkedIn profile matching. That is their documented methodology and they are honest about it. If a visitor has a LinkedIn cookie, is in a US business context, and the browser fingerprint maps cleanly, RB2B returns a name. If any of those conditions fail, no match.
Pricing: free tier, $79/mo Starter, $149/mo Pro. Slack delivery is well-executed. Email is not returned on every plan; you typically get the LinkedIn URL and a Slack ping.
Leadpipe
Leadpipe uses its own identity graph. Deterministic matching against cookies, first-party signals, and 280M verified profiles refreshed every 24 hours. There is no “if LinkedIn is cooperating today” dependency.
Pricing: $147/mo Pro (500 IDs, full contact data including business and personal email, phone, title, firmographics), $299/mo Growth, $599/mo Scale, $1,279/mo Agency, ~$8K/mo Enterprise.
The architectural difference is the source of the identity. RB2B leans on LinkedIn’s graph. Leadpipe owns its graph. That difference is what shows up in the accuracy gap.
What the verified data says you should expect
You should not have to take this on faith. The closest verified anchor is the independent Gartner-certified audit of 75,000 visitors over 120 days.
Identification accuracy (independent audit):
Leadpipe ████████████████████ 8.7/10
RB2B ███████████ 5.2/10
Warmly ████████ 4.0/10
| Tool | Overall accuracy | Method | Weakness mode |
|---|---|---|---|
| Leadpipe | 8.7/10 | Deterministic, own graph | None highlighted in audit |
| RB2B | 5.2/10 | Probabilistic via LinkedIn | False positives on disputed identities |
| Warmly | 4.0/10 | Probabilistic | High false-positive rate |
The audit found that probabilistic matching produced significantly more false positives than deterministic. The “disputed identity” set is where probabilistic tools tend to lose: same browser, same session, but the tool returns a different person than the one actually visiting.
For your A/B, this implies: in the overlap set, expect a meaningful share of cases where the two tools agree on the company but disagree on the person. The audit’s accuracy ladder predicts which tool’s answer is more often the correct one. Spot-check by pulling the LinkedIn profile manually and comparing to what each tool returned. The pattern that holds across the audit is what you will see in your own test.
What this implies for your downstream sequence
A wrong name on a Slack alert is worse than no name. Your AE sends a personalized email to “Hi Sarah” when the visitor was actually Matthew, a peer at the same company. One of those is embarrassing. The other is a bounce you cannot recover from.
The downstream cost of probabilistic false positives shows up in three places:
- Bounce rate climbs. Sending to wrong-person emails increases bounce volume, which damages sender reputation.
- Negative replies climb. People who get an email referencing pages they did not visit reply angrily, which the mailbox provider reads as a spam-pattern signal.
- CRM hygiene degrades. Your AE updates the wrong contact record, which then poisons the next outreach attempt.
Salesforce is full of bad data for this reason. Probabilistic identification does not just lose to deterministic on accuracy; it actively pollutes the CRM downstream of the test.
Where RB2B is legitimately fine
I want to be fair. RB2B is a reasonable starter product, and for a certain kind of team it is the right first step.
- The free plan is a real on-ramp. If you have never seen person-level visitor identification, installing RB2B is a quick way to find out it is real and your traffic has signal in it.
- Slack delivery is well-executed. If “one channel, one ping per visitor” is the workflow, RB2B ships that cleanly.
- Pricing is simple. No seat math.
Where it falls over is what happens after the identification. Limited contact data on lower plans means your workflow routes through LinkedIn outreach (which compounds the LinkedIn dependency) or through manual lookups (which do not scale). No phone returned, no intent topics, thin behavioral data, no suppression lists, no white-label. You can prove out the category on RB2B. You cannot build a serious revenue motion on top of it.
That mirrors what we see in our own pipeline: teams outgrow RB2B in 2-3 months, not 2-3 years. More in Leadpipe vs RB2B.
Where Leadpipe is structurally different
Five differentiators that show up in the test data.
| Capability | Why it matters in the A/B |
|---|---|
| Own identity graph | Match rate stable across traffic shape; less variance week to week |
| Deterministic matching | Lower false positive rate, fewer wrong-name sends |
| Full contact data on starter plan | Downstream sequence does not depend on extra enrichment |
| Suppression lists | Existing customers, competitors, and partners filtered before they hit Slack |
| 200+ integrations + 23-endpoint REST API | Pipeline routing into your existing CRM/sequencer is native, not a Zapier hack |
The suppression lists are the underrated one. RB2B does not ship them. In a 90-day test, your competitor’s RevOps lead will visit your site, get identified, hit the Slack channel, and your AE will spend cycles trying to reach them. Leadpipe filters that traffic before it ever reaches the workflow.
How to read the result
Three views matter, in this order:
- Pipeline per dollar spent. The bottom-line view. If one tool produces materially more pipeline per dollar of platform spend plus AE time, the rest of the conversation is decoration.
- Reply rate gap on identical outreach. The leading indicator. Same template, same sender, different source list. If one source produces materially higher reply rate, downstream pipeline will follow.
- Bounce and complaint trajectory. The early-warning view. If one source pushes your bounce rate up over the 90 days, you are watching identity-quality drift in real time.
A common pattern in customer accounts that have run this comparison: the cost difference between the two plans is dwarfed by the difference in downstream productive work the AE can do with the output. The plan you buy matters less than what arrives in the inbox at the end of the pipeline.
When the headline match rate misleads
Two cases where match rate is a misleading single-number summary.
Case 1: identical match rates, different completeness
If both tools return the same number of unique people but one returns email and the other does not, the apparently-equal match rate is meaningless. The cohort that does not include email cannot be sequenced without a separate enrichment step. Add the enrichment cost and time, and the apparent parity becomes a meaningful gap.
Case 2: high match rate, high false positive rate
A 25% match rate with 70% accuracy returns 17.5 correct identifications per 100 visitors. A 15% match rate with 90% accuracy returns 13.5 correct identifications per 100 visitors. The headline favors the first; the actionable cohort favors the second once you exclude the false positives. The audit data is what makes this visible.
The honest unit is “correctly-identified people whose contact data is usable in the workflow.” That is the cohort that drives downstream pipeline.
What to log if you actually run this
Build a simple pivot for the comparison. The columns:
| Cohort | Total uniques in cohort | Match rate | Avg data completeness | Bounce rate | Reply rate | Meetings booked | Pipeline (60d) |
|---|---|---|---|---|---|---|---|
| RB2B-only | |||||||
| Leadpipe-only | |||||||
| Overlap, agree | |||||||
| Overlap, disagree |
The most-informative row is “overlap, disagree.” Spot-check 50 of those by pulling the LinkedIn profile manually. The agreement rate with each tool is your direct accuracy read on your own traffic.
What it means
Three things to take away from this design.
1. The match rate gap is the entire conversation. If one tool identifies fewer correct people, nothing you do with the outputs recovers the gap. You can run perfect outreach on an undersized list and still lose to a team running average outreach on a 2-3x larger correctly-identified list.
2. The plan you buy matters more than the brand you buy. A free or cheap plan that does not return email is not directly comparable to a plan that does. The Leadpipe Pro plan at $147/mo returns full contact data; that is the plan-to-plan comparison that mirrors a real motion.
3. The downstream numbers are the only numbers that matter. Match rate is a proxy. Replies and meetings are the thing. If you are evaluating visitor ID tools, set up an A/B where the tool does not get graded on its own metric. Grade it on your pipeline. Google Analytics is lying about your pipeline the same way; instrument the comparison honestly.
What we would do differently if we re-ran our own test
If I were running this comparison fresh today:
- Longer window. 90 days is enough to be directional. 180 days would let you attribute late-arriving deals.
- Control for time-of-day traffic. Paid ad traffic shape changes week to week. Lock ad spend during the test.
- Publish daily-level overlap. Anyone running the same test would benefit from seeing overlap stability day by day, not just at the end.
- Randomize outreach sender. Same sender for both cohorts confounds the AE-skill variable. A randomized 2x2 removes sender effects.
The decision framework
If you are currently running RB2B, three questions.
- Do your outbound motions depend on email delivery? If yes, the limited-contact-data workflow is a ceiling.
- Does your ICP skew away from pure tech/sales LinkedIn density (healthcare, financial services, manufacturing, agencies)? If yes, LinkedIn-only matching is going to leave a lot on the table.
- Do you want intent data beyond your own site, or suppression lists for existing customers? RB2B does not have either today.
Two yeses, and it is worth running the A/B. One no, and RB2B’s free tier is a fine place to stay for now.
Leadpipe identifies 30-40%+ of your US B2B visitors with full contact data on the Pro plan at $147/mo. No credit card to start the 500-lead trial. Start identifying visitors →