Strategy

Where Do AI Sales Agents Get Intent Data?

A full tour of the sources AI sales agents pull intent data from, which are machine-ready, which are not, and how to tell them apart.

George Gogidze George Gogidze · · 10 min read
Where Do AI Sales Agents Get Intent Data?

An AI sales agent is only as smart as the signal it reads. If you ask where the signal comes from, you will get a fog of vendor names and an uncomfortable amount of “trust us.” The truth is more useful and more boring. There are four real sources. They have different shapes, different refresh rates, and different price points. And only some of them are actually machine-ready in 2026.

I am George, founder of Leadpipe. We build one of the four. I am going to give you the honest tour, including where we are not the answer, because the landscape is more important than the vendor pitch.


The four real sources

Every meaningful intent signal an AI agent can consume falls into one of these four buckets.

SourceWhat it tells youTypical shapePrice range
First-party web behaviorWho came to your site, what they readPerson-level, real-time$147/mo to enterprise
Third-party cross-site web behaviorWho is researching your category elsewherePerson- or account-level, daily to weekly$500/mo to $100K+/yr
Third-party review and research platformsWho is evaluating tools on category sitesBuyer intent flags, weekly-ish~$15K+/yr
Community and social signalsWho is engaging in your niche communityPerson-level, real-time where available~$15 to $50K/yr

CRM behavior (opens, clicks, past interactions) is often called “intent,” but it is really a fifth layer: known-contact behavior inside your own system. Important, but different from the four above because it only covers people you already know.


Source 1: first-party web behavior

This is your own site. Someone lands on your pricing page and leaves without filling out a form. 97 out of 100 do this. An AI agent that cannot see them is blind to the highest-intent channel you own.

What it answers: Who, specifically, just engaged with your site, what pages, how long, return visit or not, source referrer, intent score.

Who supplies it:

  • Leadpipe. 30-40%+ match rate on US B2B traffic. Deterministic match via cookie and first-party signals. Own identity graph (built, not licensed). 8.7/10 in the independent accuracy test. Real-time webhooks, 23 REST endpoints, 27-tool MCP server, TypeScript SDK. $147/mo Pro plan up through enterprise.
  • RB2B. Free tier / $79/mo Starter / $149/mo Pro. LinkedIn-only matching, probabilistic. 5.2/10 in the independent test. Dashboard-first, Slack alerts, limited API for agent consumption.
  • Warmly. ~$900+/mo Data Agent. Bundles chat, video, ID. 4.0/10 in the independent test. Sales-floor UI, less API-first.
  • Leadfeeder (Dealfront). €99/mo. Company-level only, not person-level. Useful for a human ABM dashboard, not an agent.
  • Clearbit (now HubSpot Breeze). Company-level inside HubSpot. Pricing in flux.

Machine-ready test: does it fire a structured webhook within seconds of identification, with a stable opaque person ID? If yes, agent-ready. If not, human-only.

Leadpipe is agent-ready on this axis by design. Most of the rest were built for human dashboards.


Source 2: third-party cross-site web behavior

This is where someone is researching your category across other people’s websites. The question is whether the data resolves at the person level or the account level.

What it answers: Who is in-market for your category this week, whether or not they have hit your site yet.

Who supplies it:

  • Leadpipe Orbit. Person-level intent across a cross-site pixel network spanning 5M+ sites. Daily refresh. 20,000+ topics. Delivered via the same 23 REST endpoints and the MCP server. See the Orbit launch post for the design.
  • Bombora. Account-level surge signals. Weekly refresh. Broad industry topic coverage. Widely used in ABM. Not person-level, which limits agent use.
  • G2 Buyer Intent. Account-level, tied to G2 category pages. Narrow but high-signal.
  • 6sense. ABM platform, ~$55K+/yr. Account-level, rich. Built for enterprise ABM marketers. Agent-facing API exists but is not the primary interface.
  • Demandbase. ABM platform, ~$55K+/yr. Similar story to 6sense.

Machine-ready test: does it resolve to a named person with business email, or does it only surface “Acme is surging”? Person-level = agent-ready. Account-level = list-building.

Most incumbents in this space are account-level. Orbit was built person-level specifically to serve the agent use case. The deeper comparison is in Orbit vs Bombora and person-level intent data: how it works.


Source 3: third-party review and research platforms

G2 and TrustRadius fall here most cleanly. Anyone shopping software in a given category is likely to hit their comparison pages. That is a high-signal event.

What it answers: Which companies (and sometimes which people) are actively evaluating tools in your category.

Who supplies it:

  • G2. Category traffic, product comparisons, buyer intent flags. Account-level intent signal, delivered through their API and integrations.
  • TrustRadius. Similar shape, different audience skew (more enterprise).
  • Capterra, GetApp. Aggregated reviews, directory-style. Lower signal density but still valuable.

Machine-ready test: does the platform ship an API or webhook? G2 and TrustRadius both do for their paid tiers. Directory sites typically do not.

The challenge with review platforms for agent use: the signal is valuable but sparse. You are looking at a subset of buyers who happened to hit a review page. Pair with first-party and cross-site for full coverage.


Source 4: community and social signals

People doing real research increasingly do it on Reddit, Slack, Discord, LinkedIn, GitHub. Tools that surface community intent can be a useful supplement, particularly for developer-focused and technical products.

What it answers: Who is asking questions, engaging with your category, or maintaining relevant projects in public spaces.

Who supplies it:

  • Common Room. GitHub, Slack, Discord, Twitter, LinkedIn signals. ~$15-50K/yr. Person-level in its niche.
  • Usergems. Job-change signals across LinkedIn. Different flavor, still a “who is in-market” proxy.
  • Clay workflows. Pulls signals from multiple sources, not a source itself but a stitching layer.

Machine-ready test: can you pull person-level records with contact info and event timestamps into an agent? Common Room yes for its supported platforms. Usergems yes for job changes. Clay yes as an orchestrator.

For a deep dive on the tradeoffs, see Common Room alternatives.


The decision guide: pick two or three

Almost no team needs all four sources. A sensible agent-ready stack picks two or three:

  • Product-led SaaS with web traffic. First-party (Leadpipe) + third-party cross-site (Orbit). Add G2 if your category has high review-site traffic.
  • Enterprise B2B with long cycles. First-party (Leadpipe) + third-party (Orbit or Bombora) + ABM platform (6sense or Demandbase) if you are already there. Add review signal.
  • Developer-focused product. First-party + Common Room. Community signal is high-value here.
  • Agencies and services. First-party is the base. Add job-change signal (Usergems) to catch decision-maker moves.

The common denominator is first-party. You always want to know who is on your own site. That is the highest-intent signal you have access to, and it is the one most teams still do not collect.


The machine-readiness dimension

Independent of source, ask whether the data is shaped for an agent to consume. Three tests:

  1. Delivery. Does it push (webhook) or do you have to pull (poll)?
  2. Shape. Structured JSON with typed fields, or a CSV dump or a PDF?
  3. Identity. Is there a stable, opaque identifier you can pass around your system, or do you have to re-key on email every time?

The table below is my honest read on the four sources by these three tests.

SourceDeliveryShapeIdentity
Leadpipe first-partyReal-time webhookStructured JSONStable person_id
Leadpipe OrbitAPI + webhookStructured JSONStable person_id
BomboraAPI, weekly batchStructuredAccount-level ID
G2APIStructuredAccount-level ID
6sense / DemandbaseAPIStructuredAccount-level ID
RB2BSlack + basic APISemi-structuredLess stable
WarmlyDashboard-firstVariesLess stable
Common RoomAPIStructuredPerson-level in supported platforms

Leadpipe has both first-party and third-party covered with the same payload shape and the same stable ID. That is not about being the biggest source, it is about being the most agent-ready one. If you are picking a data provider for an AI SDR, the decision guide here is the long form.


The concrete example

An AI sales agent with only source 2 (third-party account-level surge) gets:

Acme Corp is surging on "CRM migration" topic this week.
Intent score: 92.

The agent writes a generic CRM-migration pitch to a guessed contact at Acme. Reply rate: 1 to 2%.

The same agent with source 1 + source 2 (first-party + person-level cross-site) gets:

Sarah Chen, VP Revenue at Acme, visited /pricing 3 min and
/vs-hubspot 90 sec this morning. Orbit shows she is researching
CRM migration and HubSpot alternatives across 4 sites this week.
Score 87. Return visit: yes.

The agent writes to Sarah specifically, references the pricing page and the comparison, and lands in the 10 to 20% reply range on the identified segment. Same agent. Same model. Same prompts. The source of intent is the delta.

We see this pattern consistently across the Leadpipe customer base when teams add first-party visitor identification on top of an account-level intent feed: the per-send economics change immediately.


How Leadpipe covers sources 1 and 2 together

The pitch, stated plainly: we built Leadpipe so that the first two sources collapse into one API surface with one identity graph under it.

  • 280M verified profiles. Own graph, not licensed.
  • 60B intent signals. Across first-party pixel traffic and the cross-site Orbit network.
  • 5M websites monitored. The Orbit network.
  • 24-hour refresh. Across the full graph.
  • 30-40%+ match rate on US B2B traffic. Deterministic.
  • Real-time webhooks. First Match, Every Update.
  • 23 REST endpoints. 5 visitor + 18 intent.
  • 27-tool MCP server. For Claude, Cursor, Codex, any MCP client. npx -y @leadpipe/mcp.
  • TypeScript SDK. npm install @leadpipe/client.
  • Suppression at API layer. Customers, churned logos, opt-outs.

You still add G2 if you care about review-site signal. You still consider Bombora or 6sense if you are running account-level ABM at enterprise scale. But the base layer of first-party and cross-site person-level intent is one installation and one API key.


The honest limits

A couple of honest limits worth naming.

Leadpipe’s match rate is 30-40%+ on US B2B traffic. That means 60-70%+ of traffic remains unresolved. That is the state of the art. No one matches all of it. Anyone claiming 70% or 90% on cold traffic is measuring differently than I would.

Third-party person-level intent depends on the breadth of the cross-site pixel network. Orbit’s 5M+ sites is wide, and it is not the whole web. Sites outside the network are dark.

Both of these are improving. Neither makes the product worse than the alternatives, which is the relevant comparison.


Every plan ships with the same identity graph, 23 REST endpoints, webhooks, and a 27-tool MCP server. Start in 5 minutes →