Your AI SDR drafts 10,000 emails a day. Maybe 200 reply. The ceiling is not the model. It is the data the model is acting on.
I am George, founder of Leadpipe. I spend most of my time talking to teams who bought an AI SDR, watched it send beautifully written nonsense for a quarter, then blamed the vendor. The vendor is rarely the problem. The input is.
The short answer to “what data does an AI SDR need to work” is this: a real target list, a real intent signal, a real contact record, a real behavior trail, and a real delivery path. If any one of those is missing, the agent is guessing. If the agent is guessing, reply rates stall below 2%. If reply rates stall below 2%, you fire the tool and blame GPT.
This post is the minimum data stack. Below it, no AI SDR works. Above it, the model choice barely matters.
The five data inputs an AI SDR needs
Every autonomous outbound agent, whether it ships under a brand like 11x, Artisan, AiSDR, or something you built yourself with the Leadpipe MCP server, consumes five categories of input. Miss one, and the downstream output breaks.
| Input | What it answers | Typical source | What breaks if missing |
|---|---|---|---|
| ICP definition | Who counts as a fit? | Your CRM, closed-won analysis | Agent writes to the wrong audience |
| Account targeting | Which companies to hit? | Firmographic database, ABM list | Spray-and-pray volume |
| Person-level intent | Which people are in-market right now? | Visitor ID, person-level intent, CRM behavior | Wrong timing, cold copy |
| Contact record | How do I reach them? | Business email, phone, LinkedIn | Bounces, spam flags |
| Behavior trail | What have they already done? | Website pages, opens, calls, form data | Generic pitch, no relevance |
Any AI SDR pitch deck that skips one of these rows is selling you a writer, not an SDR.
Input 1: ICP definition
This is the cheapest input to produce and the one most teams still get wrong. Your AI SDR needs a machine-readable definition of fit. Not a slide that says “mid-market SaaS.” A filter set the agent can apply at the record level.
A usable ICP for an agent looks like:
Industry: SaaS, Fintech, B2B Marketplace
Employees: 50 to 500
Revenue: 5M to 100M
Geography: US, Canada, UK
Seniority: Director+
Departments: Marketing, Revenue, Growth
Exclude: Current customers, churned accounts, tier-3 geos
The exclude list matters more than the include list. An agent with no suppression logic will write to your own customers and your own churned logos within a week. I have watched it happen. Three separate design partners, same story. The Leadpipe API supports suppression and exclusion lists directly, which RB2B and most probabilistic tools do not.
ICP lives with you, not with the AI SDR vendor. Own it. Version it. Make it auditable.
Input 2: Account targeting
Once fit is defined, the agent needs a universe of accounts that match. This is the firmographic layer: company name, domain, size, revenue, industry, location, tech stack when relevant. ZoomInfo claims ~95% email accuracy (claimed), Apollo claims ~90 to 95% (claimed), Cognism claims similar (claimed). Pick your source, but do not let the agent build its own universe on the fly. That is how you end up mailing law firms about devops tooling.
Most teams already have this. The failure mode is stale data. Firmographic records decay fast, and departments and headcounts change quarterly. For AI-driven outbound, refresh matters more than it did when humans were in the loop, because a human SDR would glance at the LinkedIn profile before sending. An agent will not.
Input 3: Person-level intent (the layer everyone skips)
This is the one that breaks AI SDRs in practice. An account is not in-market. A person is. If you cannot tell your agent which specific people on your target accounts are researching your category right now, the agent is sending at the wrong time to the wrong person inside the right logo.
Person-level intent comes from two sources, and a good AI SDR stack uses both:
- First-party visitor identification. Who just visited your pricing page? Your integrations page? Your competitor comparison? Leadpipe resolves 30-40%+ of US B2B traffic at person level, deterministically, with a match rate validated in an independent accuracy test at 8.7/10 against RB2B at 5.2/10 and Warmly at 4.0/10.
- Third-party person-level intent. Who is researching your category across 5M other websites? That is Orbit, which reads intent across the cross-site pixel network and refreshes daily on 20,000+ topics. See how person-level intent data works for the mechanics.
The practical difference looks like this. An AI SDR with only firmographic data writes: “Hi Sarah, I saw that Acme is a mid-market SaaS company.” An AI SDR with firmographics plus person-level intent writes: “Hi Sarah, I noticed you and two others at Acme were reading about CRM migration this week. We helped [peer company] cut their migration window by half.” One of these is a template. The other is a conversation.
Across the Leadpipe customer base, teams that add person-level intent to a previously firmographic-only AI SDR consistently see reply rates move from the low single digits into the 10 to 20% range on identified segments. Same agent, same prompts, different input.
Input 4: Contact record
The agent needs to reach the person. That means verified business email, phone if you are doing calls, LinkedIn URL if you are sequencing on social. Three points worth saying out loud:
- Every record needs a freshness stamp. An email verified in 2023 is not the same as an email verified yesterday. If the agent cannot see the freshness, it cannot prioritize.
- Personal email is not a substitute for business email for B2B outbound. It is a complement for consumer or founder-led motions, and Leadpipe returns both where available.
- Hashed emails (HEM) matter for paid retargeting, not cold outreach. Do not confuse the two.
Leadpipe returns 100+ data points per identified person, including business and personal email, phone, LinkedIn, job title, seniority, department, work history, and company firmographics. Same record, same payload. That is the contact record the agent needs.
Input 5: Behavior trail
Behavior is the input that makes AI outreach feel like outreach instead of automation. Pages viewed, time on each page, referrer, return visit history, any form submission, any previous CRM activity. The agent turns this into context. Without it, every email is the same email.
A useful behavior payload for an agent looks like this:
{
"person": {"email": "sarah@acme.com", "title": "VP Marketing"},
"company": {"domain": "acme.com", "size": "200-500"},
"pages": [
{"url": "/pricing", "duration_s": 180, "ts": "..."},
{"url": "/integrations/hubspot", "duration_s": 45, "ts": "..."},
{"url": "/vs-competitor", "duration_s": 95, "ts": "..."}
],
"return_visit": true,
"intent_score": 82,
"matched_topics": ["crm migration", "hubspot alternatives"]
}
Leadpipe ships exactly this shape via webhook. See the webhook payload reference for the full schema. The agent gets a structured object, not a CSV dump.
What each input looks like in a minimum viable AI SDR stack
The minimum viable stack is not three tools. It is one identity graph feeding one orchestration layer feeding one sender.
Here is the stack I recommend when a team asks me “what do I need before I turn this on?”:
| Layer | Job | Leadpipe role |
|---|---|---|
| Identity graph | Resolve anonymous traffic to people, enrich firmographics, score intent | Core product |
| Intent layer | Flag who is in-market across the web, not just on your site | Orbit |
| Delivery path | API, webhook, MCP, SDK so the agent can consume data without middleware | 23 REST endpoints, 27-tool MCP, webhooks |
| Orchestrator | Clay, custom code, or the agent itself | Any |
| Sender | Instantly, Smartlead, Outreach, Salesloft, or native AI SDR | Any |
Notice what is not on this list. There is no separate “firmographic vendor” row. Leadpipe returns firmographics with every match. There is no separate “intent vendor” row. Orbit handles person-level intent. There is no separate “email verification” row for identified traffic. If Leadpipe returned the email, it is verified against the identity graph, not guessed.
This is the shape the stack takes when you treat identity as infrastructure. I wrote more on this in Why Every AI Agent Needs an Identity API.
Leadpipe as the data layer underneath your AI SDR
The five inputs above are not abstract. Leadpipe was built to supply four of them, and it plugs into whatever you use for the fifth (ICP definition, which is your job anyway).
- Own identity graph. Built, not licensed. 280M verified profiles, 60B intent signals, 5M websites monitored, 24-hour refresh. Deterministic matching, cookie and first-party signals. Match rate of 30-40%+ on US B2B traffic.
- Person-level intent via Orbit. Cross-site pixel network, daily refresh, 20,000+ topics. Not company-level intent. Not monthly batch. Person-level, fresh.
- Delivery paths built for agents. 23 REST endpoints, real-time webhooks (First Match, Every Update), CSV export, an npm SDK (
npm install @leadpipe/client), and a 27-tool MCP server you install withnpx -y @leadpipe/mcp. The MCP server post covers the full surface. - Suppression and exclusion. At the API level, not the dashboard level. Your agent can filter customers and churned logos before a single prompt runs.
The posts that go deeper are The Data Layer AI Sales Agents Are Missing and How to Feed Visitor Data Into an AI Agent. If you are picking a data source, How to Choose a Data Provider for Your AI SDR is the decision tree.
The concrete example
An AI SDR running on firmographic data only, with a list of 10,000 matched accounts and 40,000 contacts:
- Send volume: 1,000 emails per day
- Open rate: 20 to 30%
- Reply rate: 1 to 2%
- Booked meetings: single digits per week
Same agent, same model, same prompts, with Leadpipe person-level intent and visitor behavior wired into the context:
- Send volume: same 1,000 per day, but ranked by intent score
- Identified visitor segment reply rate: 10 to 20%
- Booked meetings: 3 to 5x the baseline
The model is the same. The prompts are the same. The delta is the data.
What to do on Monday
- Write down your ICP as a filter set. Share it with the AI SDR vendor. If they cannot apply it at the API level, that is a tool problem.
- Put a visitor identification pixel on your site today. 2 to 5 minutes, JavaScript, self-serve. Start collecting person-level data on the traffic you already have.
- Wire the webhook to your agent or to Clay. How to add visitor identification to your Clay waterfall has the recipe.
- Add person-level intent via Orbit once the visitor feed is live. Daily refresh, 20,000+ topics.
- Turn on suppression. Customers and churned logos should never touch the agent.
If your AI SDR vendor will not let you do the above, they are selling a writer. Writers are cheap. Data is not.
Every plan ships with the same identity graph, 23 REST endpoints, webhooks, and a 27-tool MCP server. Start in 5 minutes →