Guides

How Do I Give an AI Agent Intent Without Exposing PII?

You can feed an AI agent useful intent signal without handing it raw PII. Here's the field-by-field design pattern plus how Leadpipe scopes data for agents.

George Gogidze George Gogidze · · 10 min read
How Do I Give an AI Agent Intent Without Exposing PII?

Your security team wants to know why a Claude agent has read access to every contact in your CRM. Your GTM team wants the agent to prioritize pipeline based on intent. Both of them are right. The answer is not “give the agent everything” and it is not “give the agent nothing.” It is a scoped data design.

I am George, founder of Leadpipe. This is a topic that comes up every time a team wires up an AI SDR or an agent-driven GTM workflow. The question is always some variant of: how do I give this thing useful intent signal without handing it a pile of raw PII I cannot take back?

Short answer: most of what makes intent data useful to an agent is not PII. You can give the agent the signal, the context, and the routing logic without surfacing raw email, phone, or HEM. When you need the full contact record to send, you scope it at the moment of send, log the access, and keep the high-volume agent context free of direct identifiers.

Here is the playbook.


What actually counts as PII in this context

A quick definition before we go further. In a GTM intent-data flow, the fields that regulators and security teams treat as PII are roughly:

FieldTypically PII?Notes
Full nameYesDirect identifier
Business emailYes, but lower riskProfessional context
Personal emailYesHigher risk
PhoneYesRegulated (TCPA, GDPR)
LinkedIn URLGray areaPublic profile, but identifier
IP addressYes in EU/UK under GDPRAlways treated as PII in EU context
HEM (hashed email)Pseudonymous, still PII in most readsSHA256/MD5
Job title, seniorityNoProfessional attribute
Company name, domain, sizeNoFirmographic
Pages viewed, visit durationNoBehavioral, non-identifying in isolation
Intent score, matched topicsNoAggregated signal
City, state, countryNo, usuallyGeo, not PII on its own

The useful insight here: the behavioral and firmographic layers that make intent data actionable are mostly not PII. You can give an agent rich context without raw identifiers, and still produce a good routing or scoring decision.


The pattern: two-tier data for the agent

Design the agent’s data surface in two tiers.

Tier 1 (context): non-PII-heavy fields the agent reads freely to reason, prioritize, and draft.

  • Company domain, size, industry
  • Job title, seniority, department
  • Pages viewed, duration, referrer
  • Matched intent topics, intent score
  • Return visit flag, timestamp

Tier 2 (identifiers): direct identifiers the agent only reads at the last mile, via a scoped call, with audit logging.

  • Business email, personal email, phone
  • Full name
  • LinkedIn URL
  • HEM if used for paid retargeting

The agent drafts with Tier 1. It only touches Tier 2 when it is about to send or when a human explicitly requests a contact lookup. That boundary is the whole pattern.


What an agent payload looks like in practice

Here is a concrete example. Same identified visitor, two payloads. Tier 1 first:

{
  "person_id": "lp_01H...QJXZ",
  "company": {
    "domain": "acme.com",
    "size": "200-500",
    "industry": "SaaS"
  },
  "role": {
    "title": "VP Revenue",
    "seniority": "VP",
    "department": "Sales"
  },
  "behavior": {
    "pages": [
      {"url": "/pricing", "duration_s": 180},
      {"url": "/vs-hubspot", "duration_s": 95}
    ],
    "return_visit": true,
    "last_seen_at": "2026-04-22T14:23:11Z"
  },
  "intent": {
    "score": 87,
    "topics": ["crm migration", "hubspot alternatives"]
  },
  "flags": {
    "is_customer": false,
    "is_churned": false,
    "is_suppressed": false
  }
}

The agent can make a routing call (high intent, ICP match, not suppressed, pricing page return visit: escalate) on this payload alone. Nothing here is a direct identifier. person_id is an opaque token.

When the agent is ready to send, it calls a second tool to exchange the person_id for the actual contact record:

{
  "person_id": "lp_01H...QJXZ",
  "name": "Sarah Chen",
  "business_email": "sarah@acme.com",
  "phone": "+1-...",
  "linkedin": "linkedin.com/in/..."
}

That second call is scoped, audited, and rate-limited. If something goes wrong (prompt injection, buggy tool routing, agent hallucination), the blast radius is the subset of records the agent actually asked to send to, not your entire contact base.


How Leadpipe supports this shape

A few product notes that matter here.

Webhooks deliver structured payloads with a stable person_id that you can pass around your system without reshipping raw PII. See the webhook payload reference for the full schema.

The 23 REST endpoints are separable by concern: enrichment, intent, visitor, pixel, account. You can give the agent a scoped API key that only hits intent and visitor, not enrichment.

The MCP server exposes tools, not raw data. The agent calls preview_audience with filters and gets a masked sample back. It calls get_audience_results when it actually needs full profiles. preview_audience returns masked emails and hashed identifiers by default.

Suppression and exclusion are at the API layer, which means the agent can check is_suppressed as a Tier 1 boolean without ever seeing the suppression list itself. That matters for compliance because you do not want the agent to have a list of “do not contact” names in its context window.


The prompt-injection angle

There is a second reason this matters, independent of privacy.

An agent that has direct access to your full contact database is one prompt injection away from being tricked into dumping it. An agent that reads Tier 1 context, drafts, and only escalates to Tier 2 for a send is much harder to exploit, because the dangerous tool (raw contact lookup) is constrained to specific workflows.

This is not a complete defense. You still need:

  • Rate limits on the contact-record tool
  • Audit logs on every access
  • Human-in-the-loop approval for bulk sends
  • Scoped API keys per agent, not your admin key

But scoping data by tier cuts out a huge class of bad outcomes. It is cheap defense in depth.


A reference architecture

Here is the shape I recommend to most teams:

┌────────────────┐     ┌──────────────────┐     ┌────────────────┐
│ Leadpipe pixel │────▶│ Webhook (First   │────▶│ Your queue or  │
│ on your site   │     │ Match / Update)  │     │ event bus      │
└────────────────┘     └──────────────────┘     └────────────────┘


                                              ┌──────────────────┐
                                              │ Tier 1 store     │
                                              │ (context, non-   │
                                              │ PII, person_id)  │
                                              └──────────────────┘

                                              agent reads freely


                                              ┌──────────────────┐
                                              │ AI agent:        │
                                              │ route, score,    │
                                              │ draft            │
                                              └──────────────────┘

                                               scoped lookup call


                                              ┌──────────────────┐
                                              │ Tier 2 store     │
                                              │ (PII, scoped,    │
                                              │ audited)         │
                                              └──────────────────┘


                                              ┌──────────────────┐
                                              │ Sender / CRM     │
                                              │ (send email,     │
                                              │ log contact)     │
                                              └──────────────────┘

This is not more complex than a normal integration. It is the same integration with the data surface split, which is a ~day of work for most teams and then free forever.


Concrete rules for agents that touch PII

A short checklist I give teams during rollout:

  1. Separate API keys per role. One key for Tier 1 reads (cheap, broad, read-only). Another key for Tier 2 lookups (scoped, rate-limited, audit-logged).
  2. Audit every Tier 2 call. Who asked, what was asked, what was returned, at what timestamp. Review weekly.
  3. Rate-limit the contact lookup tool. Real sends do not need thousands of lookups per minute. If the agent tries, something is wrong.
  4. Human approval on bulk. Any bulk export or bulk send should require a human click.
  5. Mask in logs. Do not print raw email or phone to application logs. Keep a redacted shape by default.
  6. Minimize retention. Tier 1 context can live longer. Tier 2 should be pulled just-in-time, not stored in the agent’s memory.
  7. Suppression as a boolean, not a list. The agent should see is_suppressed: true, not the list of suppressed addresses.
  8. Consent state carried in Tier 1. GDPR consent, CCPA do-not-sell, opt-outs: express as flags on the Tier 1 object.

None of this is magic. It is basic data minimization applied to an agent surface.


GDPR and regulated-industry notes

A few region- and industry-specific notes that matter for the agent surface.

EU/UK under GDPR. Leadpipe’s default is company-level in those regions. Person-level requires affirmative consent. That means Tier 2 is either unavailable or consent-scoped for EU/UK visitors by default. Tier 1 company-level context is still useful for the agent to reason over. See GDPR-compliant visitor identification for the full picture.

Healthcare and financial services. You likely have DPA and data-handling requirements that go past what Leadpipe alone covers. The Tier 1 / Tier 2 split is still the right shape. You may additionally want to host the Tier 2 store in a controlled environment and keep even person_id scoped per agent session.

Data broker registration. Leadpipe is registered in California, Texas, Vermont, and Oregon. That addresses the broker side. On your side, you are responsible for lawful basis of processing and for how the agent uses the data.

I am deliberately not claiming SOC 2 Type II certification. We are not there yet, readiness is in progress, and I want to be specific rather than vague about that. DPA and subprocessor list available on request.


Common mistakes

  1. Putting the full CRM in the agent’s context window. Big blast radius, slow, expensive, prompt-injection risk.
  2. Giving the agent an admin API key. It needs a scoped key per tool.
  3. Using raw email as the primary key. Use an opaque person_id. Means you can log and audit without exposing identifiers.
  4. Storing contact records in the agent’s memory. Vector stores and agent memory were not designed to be HIPAA- or GDPR-compliant PII stores.
  5. Assuming anonymization equals masking. Hashing email is pseudonymization, not anonymization. Still PII in most legal reads.

Concrete example

A team I worked with this quarter had a Claude-based agent doing pipeline triage. Original design: agent read the full HubSpot contact table and Leadpipe visitor feed, drafted outreach, sent through their ESP.

After the redesign:

  • Agent reads Tier 1 only: intent score, topic, role, company, pages, flags.
  • Agent makes a routing decision: escalate to human, auto-nurture, or immediate send.
  • For immediate send, agent calls the scoped contact-lookup tool with the specific person_id.
  • Tier 2 lookup logs every call with the agent session ID.

No meaningful loss in output quality. Every decision the agent made before, it can still make. The draft emails are still referenced and personalized. What changed is that the vast majority of the agent’s operating context is non-PII, and the small fraction that is sensitive is scoped and logged.


Five-step path

  1. Install the Leadpipe pixel. Webhook payloads carry a stable person_id you can use as the opaque handle.
  2. Build a Tier 1 store keyed by person_id. Non-PII context, flags, intent, pages, firmographics.
  3. Build a scoped Tier 2 lookup service. Takes person_id, returns contact. Audit-logged, rate-limited.
  4. Point the agent at Tier 1. Tool-access Tier 2 only on send.
  5. Review the audit log weekly. Adjust rate limits and scopes.

Every plan ships with the same identity graph, 23 REST endpoints, webhooks, and a 27-tool MCP server. Start in 5 minutes →