RB2B vs Leadpipe: How to Run a 90-Day A/B Test

I get the same question roughly once a week from prospects already running RB2B: “If I switch to Leadpipe, what actually changes?” The honest answer is, it depends on your traffic shape, your ICP, and what you do with the leads after identification. But “it depends” is a cop out, so let me give you the framework instead: how to run your own 90-day A/B, exactly what to measure, and what the verified accuracy data says you should expect.

I am George, founder of Leadpipe. This is the test design I would run if I were sitting in your seat.

The short version

A 90-day A/B is the cleanest way to settle this, and it is not as hard to set up as it sounds. Both pixels can run on the same site simultaneously. The interesting comparison is not which tool returns the bigger top-line number. It is which tool returns identifications you can actually use, and which set of identifications produces the larger downstream pipeline.

The verified anchor for the comparison: an independent Gartner-certified audit of 75,000 visitors over 120 days scored Leadpipe 8.7/10, RB2B 5.2/10, Warmly 4.0/10. The accuracy gap is the biggest determinant of which tool wins your test downstream.

How to set up the test

This is the design that produces honest data. Skipping any of these steps invites confounders.

The pixels

Install both pixels on every page of your site. They each run their own identity logic against the same browsers. You do not need to reconcile them. At the end of the window, pull both datasets and compare.

The window

90 days minimum. 60 days is too short to see late-arriving conversions; 120+ is better but most teams do not have the patience. Lock the window before you start.

The cohort

US B2B visitors only. Both tools are strongest on US B2B. Including international traffic adds noise and is unfair to both, since neither product is the right fit for EU/UK person-level identification under GDPR (Leadpipe defaults to company-level there; RB2B has its own constraints).

The downstream sequence

The most important design choice. Send identical outreach to both cohorts. Same template, same sender domain, same days. The only thing that differs is the source list. If you change the message between the two cohorts, you are testing copy, not tools.

What you log on day 0

Write down before starting:

Total US B2B unique visitor count baseline (from your analytics tool)
Cost of each plan over the 90 days
Sender domain reputation snapshot (to check for drift)
The exact template you will use downstream
The pipeline attribution rule (we recommend 60-day attribution from first identified visit)

What to measure

Six things. Write them down before starting so you do not move the goalposts.

1. Match rate

Identified contacts divided by unique US B2B visitors. This is the input metric. It is not the most important metric, but it caps everything else.

2. Unique people identified

Some tools identify the same person multiple times. Dedupe before counting. The honest number is unique identified humans.

3. Overlap

The set of people identified by both tools. The size of the overlap tells you how much of your traffic is “easy to identify” versus how much each tool finds independently. The disagreement set inside the overlap (where both tools return a name but a different name) is the most interesting cell in the whole test.

4. Data completeness

For each identification, did the tool return a usable email, a phone, a title, a LinkedIn URL? An identification with no email and no phone is much harder to act on than one with both.

5. Downstream reply rate

Send the identical sequence to a randomized split of each cohort (RB2B-only, Leadpipe-only, overlap). Measure reply rate, positive reply rate, bounce rate, and unsubscribe rate.

6. Pipeline generated

Opportunities attributable to the identified visitor inside 60 days of the visit. This is the only metric that matters to a CRO.

Layer	Metric	Why it matters
Top of funnel	Match rate	Caps the size of every downstream cohort
Top of funnel	Unique people identified	Honest unit of measurement
Mid funnel	Overlap and disagreement	Reveals identity graph quality
Mid funnel	Data completeness	Determines actionability
Bottom of funnel	Reply rate, bounce rate	Honest downstream signal
Bottom of funnel	Pipeline generated	The CRO number

What the two tools are doing under the hood

Useful context before you run the test.

RB2B

RB2B uses LinkedIn profile matching. That is their documented methodology and they are honest about it. If a visitor has a LinkedIn cookie, is in a US business context, and the browser fingerprint maps cleanly, RB2B returns a name. If any of those conditions fail, no match.

Pricing: free tier, $79/mo Starter, $149/mo Pro. Slack delivery is well-executed. Email is not returned on every plan; you typically get the LinkedIn URL and a Slack ping.

Leadpipe

Leadpipe uses its own identity graph. Deterministic matching against cookies, first-party signals, and 280M verified profiles refreshed every 24 hours. There is no “if LinkedIn is cooperating today” dependency.

Pricing: $147/mo Pro (500 IDs, full contact data including business and personal email, phone, title, firmographics), $299/mo Growth, $599/mo Scale, $1,279/mo Agency, ~$8K/mo Enterprise.

The architectural difference is the source of the identity. RB2B leans on LinkedIn’s graph. Leadpipe owns its graph. That difference is what shows up in the accuracy gap.

What the verified data says you should expect

You should not have to take this on faith. The closest verified anchor is the independent Gartner-certified audit of 75,000 visitors over 120 days.

Identification accuracy (independent audit):
Leadpipe   ████████████████████ 8.7/10
RB2B       ███████████          5.2/10
Warmly     ████████             4.0/10

Tool	Overall accuracy	Method	Weakness mode
Leadpipe	8.7/10	Deterministic, own graph	None highlighted in audit
RB2B	5.2/10	Probabilistic via LinkedIn	False positives on disputed identities
Warmly	4.0/10	Probabilistic	High false-positive rate

The audit found that probabilistic matching produced significantly more false positives than deterministic. The “disputed identity” set is where probabilistic tools tend to lose: same browser, same session, but the tool returns a different person than the one actually visiting.

For your A/B, this implies: in the overlap set, expect a meaningful share of cases where the two tools agree on the company but disagree on the person. The audit’s accuracy ladder predicts which tool’s answer is more often the correct one. Spot-check by pulling the LinkedIn profile manually and comparing to what each tool returned. The pattern that holds across the audit is what you will see in your own test.

What this implies for your downstream sequence

A wrong name on a Slack alert is worse than no name. Your AE sends a personalized email to “Hi Sarah” when the visitor was actually Matthew, a peer at the same company. One of those is embarrassing. The other is a bounce you cannot recover from.

The downstream cost of probabilistic false positives shows up in three places:

Bounce rate climbs. Sending to wrong-person emails increases bounce volume, which damages sender reputation.
Negative replies climb. People who get an email referencing pages they did not visit reply angrily, which the mailbox provider reads as a spam-pattern signal.
CRM hygiene degrades. Your AE updates the wrong contact record, which then poisons the next outreach attempt.

Salesforce is full of bad data for this reason. Probabilistic identification does not just lose to deterministic on accuracy; it actively pollutes the CRM downstream of the test.

Where RB2B is legitimately fine

I want to be fair. RB2B is a reasonable starter product, and for a certain kind of team it is the right first step.

The free plan is a real on-ramp. If you have never seen person-level visitor identification, installing RB2B is a quick way to find out it is real and your traffic has signal in it.
Slack delivery is well-executed. If “one channel, one ping per visitor” is the workflow, RB2B ships that cleanly.
Pricing is simple. No seat math.

Where it falls over is what happens after the identification. Limited contact data on lower plans means your workflow routes through LinkedIn outreach (which compounds the LinkedIn dependency) or through manual lookups (which do not scale). No phone returned, no intent topics, thin behavioral data, no suppression lists, no white-label. You can prove out the category on RB2B. You cannot build a serious revenue motion on top of it.

That mirrors what we see in our own pipeline: teams outgrow RB2B in 2-3 months, not 2-3 years. More in Leadpipe vs RB2B.

Where Leadpipe is structurally different

Five differentiators that show up in the test data.

Capability	Why it matters in the A/B
Own identity graph	Match rate stable across traffic shape; less variance week to week
Deterministic matching	Lower false positive rate, fewer wrong-name sends
Full contact data on starter plan	Downstream sequence does not depend on extra enrichment
Suppression lists	Existing customers, competitors, and partners filtered before they hit Slack
200+ integrations + 23-endpoint REST API	Pipeline routing into your existing CRM/sequencer is native, not a Zapier hack

The suppression lists are the underrated one. RB2B does not ship them. In a 90-day test, your competitor’s RevOps lead will visit your site, get identified, hit the Slack channel, and your AE will spend cycles trying to reach them. Leadpipe filters that traffic before it ever reaches the workflow.

How to read the result

Three views matter, in this order:

Pipeline per dollar spent. The bottom-line view. If one tool produces materially more pipeline per dollar of platform spend plus AE time, the rest of the conversation is decoration.
Reply rate gap on identical outreach. The leading indicator. Same template, same sender, different source list. If one source produces materially higher reply rate, downstream pipeline will follow.
Bounce and complaint trajectory. The early-warning view. If one source pushes your bounce rate up over the 90 days, you are watching identity-quality drift in real time.

A common pattern in customer accounts that have run this comparison: the cost difference between the two plans is dwarfed by the difference in downstream productive work the AE can do with the output. The plan you buy matters less than what arrives in the inbox at the end of the pipeline.

When the headline match rate misleads

Two cases where match rate is a misleading single-number summary.

Case 1: identical match rates, different completeness

If both tools return the same number of unique people but one returns email and the other does not, the apparently-equal match rate is meaningless. The cohort that does not include email cannot be sequenced without a separate enrichment step. Add the enrichment cost and time, and the apparent parity becomes a meaningful gap.

Case 2: high match rate, high false positive rate

A 25% match rate with 70% accuracy returns 17.5 correct identifications per 100 visitors. A 15% match rate with 90% accuracy returns 13.5 correct identifications per 100 visitors. The headline favors the first; the actionable cohort favors the second once you exclude the false positives. The audit data is what makes this visible.

The honest unit is “correctly-identified people whose contact data is usable in the workflow.” That is the cohort that drives downstream pipeline.

What to log if you actually run this

Build a simple pivot for the comparison. The columns:

Cohort	Total uniques in cohort	Match rate	Avg data completeness	Bounce rate	Reply rate	Meetings booked	Pipeline (60d)
RB2B-only
Leadpipe-only
Overlap, agree
Overlap, disagree

The most-informative row is “overlap, disagree.” Spot-check 50 of those by pulling the LinkedIn profile manually. The agreement rate with each tool is your direct accuracy read on your own traffic.

What it means

Three things to take away from this design.

1. The match rate gap is the entire conversation. If one tool identifies fewer correct people, nothing you do with the outputs recovers the gap. You can run perfect outreach on an undersized list and still lose to a team running average outreach on a 2-3x larger correctly-identified list.

2. The plan you buy matters more than the brand you buy. A free or cheap plan that does not return email is not directly comparable to a plan that does. The Leadpipe Pro plan at $147/mo returns full contact data; that is the plan-to-plan comparison that mirrors a real motion.

3. The downstream numbers are the only numbers that matter. Match rate is a proxy. Replies and meetings are the thing. If you are evaluating visitor ID tools, set up an A/B where the tool does not get graded on its own metric. Grade it on your pipeline. Google Analytics is lying about your pipeline the same way; instrument the comparison honestly.

What we would do differently if we re-ran our own test

If I were running this comparison fresh today:

Longer window. 90 days is enough to be directional. 180 days would let you attribute late-arriving deals.
Control for time-of-day traffic. Paid ad traffic shape changes week to week. Lock ad spend during the test.
Publish daily-level overlap. Anyone running the same test would benefit from seeing overlap stability day by day, not just at the end.
Randomize outreach sender. Same sender for both cohorts confounds the AE-skill variable. A randomized 2x2 removes sender effects.

The decision framework

If you are currently running RB2B, three questions.

Do your outbound motions depend on email delivery? If yes, the limited-contact-data workflow is a ceiling.
Does your ICP skew away from pure tech/sales LinkedIn density (healthcare, financial services, manufacturing, agencies)? If yes, LinkedIn-only matching is going to leave a lot on the table.
Do you want intent data beyond your own site, or suppression lists for existing customers? RB2B does not have either today.

Two yeses, and it is worth running the A/B. One no, and RB2B’s free tier is a fine place to stay for now.

Leadpipe identifies 30-40%+ of your US B2B visitors with full contact data on the Pro plan at $147/mo. No credit card to start the 500-lead trial. Start identifying visitors →