Product

What Is a HEM and How Does It Match a Person?

A hashed email (HEM) bridges anonymous browser sessions and real people. How SHA256, SHA1, and MD5 HEMs work in identity resolution.

George Gogidze George Gogidze · · 9 min read
What Is a HEM and How Does It Match a Person?

HEM is one of those pieces of identity-resolution jargon that everyone in the space uses and almost nobody explains. If you’ve read a data sheet and seen “SHA256, SHA1, MD5 HEMs supported,” and had no idea what that meant for your pipeline, this post is for you.

I am George, founder of Leadpipe. HEMs are the primary bridge between anonymous browser sessions and real people in our identity graph. This post explains what a HEM is, how it gets from a login event to a webhook payload, and why we ship three hash algorithms instead of one.


What a HEM actually is

HEM stands for Hashed Email. It is the output of running an email address through a one-way cryptographic hash function.

Input:   sarah.chen@acme.com
         |
         v  (SHA256)
         |
Output:  8a7f9c3e2b1d4a5e6f8c9d0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c

Two things about the output matter:

  1. The same email always produces the same hash. If two different systems hash the same address with the same algorithm, they get the same output.
  2. You cannot reverse it. Given the hash, you cannot compute the original email.

Those two properties are why HEMs are the standard currency of identity resolution. They let two systems confirm they’re talking about the same person without either system sharing the raw email in the clear.

If you want the glossary-level definition, see identity resolution. This post goes one level deeper.


Why three hash types

Three hash algorithms dominate the identity space: SHA256, SHA1, and MD5. We support all three in our API payload (see the webhook payload reference). The reason is not nostalgia. It’s that different platforms standardized on different algorithms, and a HEM is only useful if you can match it against the system you want to talk to.

HashWhere it’s commonWhat to use it for
SHA256Modern ad platforms, most identity graphs, LiveRamp, MetaThe default for new integrations
SHA1Google Ads Customer Match, some legacy ad platformsAd platform audience uploads
MD5Legacy data warehouses, older CDPs, some email service providersJoining against legacy systems

Shipping all three means you can take a Leadpipe payload and match it against any of these systems without rehashing on your end. That sounds mundane. It saves hours of pipeline plumbing per integration.


The anatomy of a HEM payload

Here’s a simplified fragment of what a HEM looks like in a Leadpipe webhook payload:

{
  "person": {
    "email": "sarah.chen@acme.com",
    "hems": {
      "sha256": "8a7f9c3e2b1d4a5e6f8c9d0a...",
      "sha1": "356a192b7913b04c54574d18c28d46e6395428ab",
      "md5": "d41d8cd98f00b204e9800998ecf8427e"
    }
  }
}

A few implementation details that trip people up:

  • Normalize before hashing. The email has to be lowercased and trimmed of whitespace before it goes through the hash. Sarah.Chen@Acme.com and sarah.chen@acme.com produce different hashes if you don’t normalize.
  • Hexadecimal output. The hash is usually emitted as lowercase hex. Some platforms expect uppercase; check the receiving system.
  • No salting. Identity-resolution HEMs are unsalted. Salting would make the hash useless for cross-system matching, which defeats the purpose. If you’ve read about password hashing, HEMs are a different problem.

How a HEM becomes a person

This is the piece most people skip. A HEM by itself is just a string. The value shows up when the HEM gets attached to a browser session and resolved through the identity graph.

The flow, end to end:

LOGIN EVENT       ->  HEM CREATED      ->  GRAPH WRITE        ->  GRAPH QUERY
(partner site)       (partner hashes       (HEM tied to          (visitor pixel
                      email on signup)      person record)        fires, HEM
                                                                  looked up,
                                                                  person returned)

Stage by stage:

  1. Login or consented signup event. A user logs into a partner site with their email. The partner site hashes the email with SHA256 (and typically SHA1 and MD5 too) and records the hash against the browser session.
  2. HEM created. The partner now has a row that says “this cookie, on this device, at this time, hashes to this HEM.”
  3. Graph write. That row flows into our identity graph as a piece of verified evidence. The HEM is attached to a person record. The person record accumulates other verified evidence over time: additional device IDs, additional cookies, additional firmographic signals.
  4. Graph query. When a customer’s visitor pixel fires, we look at the browser’s anonymous identifiers, find any associated HEMs in the graph, and resolve the cluster to a person. The person, their employer, their title, their phone, and 100+ other data points come out the other side.

The HEM is the spine of that whole flow. Without it, the graph would not have a stable anchor to join signals to people across time.

For the architectural context, see how identity graphs work and our API developer guide.


Deterministic, not probabilistic

A HEM-driven match is deterministic. The email either hashed to the stored HEM or it didn’t. There’s no “confidence score” for whether a hash matches. It matches or it doesn’t.

That is why HEM-anchored identity graphs consistently outperform probabilistic approaches. Probabilistic matching looks at IP ranges, device similarity, and timing patterns and says “this is probably the same person.” Deterministic matching looks at a HEM and says “this is the same person.”

Our independent accuracy test is the exhibit: 8.7/10 for deterministic matching backed by HEMs, versus 4.0 to 5.2 for tools that lean probabilistic. The deep comparison goes into the why.


How HEMs show up in your stack

Three practical places you’ll actually use HEMs.

Ad platform audience uploads

Google Ads, Meta, and LinkedIn accept hashed email lists for audience targeting. You take an export of identified visitors, pull the SHA256 (or SHA1 for Google) HEM per record, and upload. The ad platform matches those HEMs against its own user base and serves ads to the overlap.

This is how Orbit LinkedIn Ads audiences and Google Ads optimization work in practice. Without HEMs, you’d be uploading raw emails, which most platforms don’t accept.

CDP and data warehouse joins

A modern data stack has identity data in multiple places: a CRM, a product analytics tool, a CDP, a data warehouse. Joining those systems on raw email is brittle because different systems normalize differently and raw email is sensitive to share. Joining on HEM gives you a stable, privacy-preserving key that every system can emit.

Third-party identity syncs

Partner integrations (LiveRamp, ID5, and similar) speak HEM natively. If you’re feeding Leadpipe identity data into a larger identity infrastructure, you’ll pass HEMs, not raw emails.


The privacy picture

A HEM is not a perfect privacy shield. Anyone with a list of candidate emails can hash each one and compare against a known HEM to reverse-identify it. This is called a rainbow-table attack, and for email addresses it is computationally trivial.

What a HEM does give you:

  • It is not raw email. You can store and transmit it in contexts where raw email would be inappropriate.
  • It is a stable join key. Systems can match on HEM without ever sharing raw email between them.
  • It supports consent-first flows. A HEM written under consent flags that consent in the identity graph; a HEM seen without consent doesn’t get promoted to a primary match.

On the compliance side: Leadpipe is CCPA compliant, registered as a data broker in CA, TX, VT, and OR, and defaults to company-level identification for EU and UK visitors under GDPR. HEMs play inside that framework; they don’t circumvent it.


Common mistakes when working with HEMs

A few patterns I’ve seen teams get wrong, each of which costs match rate or breaks an integration.

Not normalizing before hashing

Sarah.Chen@Acme.com and sarah.chen@acme.com are the same email. They produce different hashes. If your upstream system emits raw emails without normalizing, every hash you generate is a different hash from the one Leadpipe has, and no match is possible.

The fix is a normalization step before the hash: lowercase, trim whitespace, strip leading and trailing dots where the domain rules allow. Most platforms publish their exact normalization requirements; check them before generating hashes.

Confusing encoding formats

Hashes are bytes. They get emitted as hex (lowercase or uppercase) or base64 depending on the system. A SHA256 hash in lowercase hex will not match the same hash in uppercase hex unless the comparing system is case-insensitive. Check what the destination expects.

Trying to use salted hashes for identity resolution

A salt is a random value added to the input before hashing. Salting is the right thing for password storage because it defeats rainbow tables. Salting is the wrong thing for identity resolution because two systems with different salts can never match the same email, which is the entire point of using a HEM.

If you’re seeing HEMs that include a salt, you’re looking at password-storage leftovers, not identity resolution. They will not join against any external system.

Mixing hash algorithms unintentionally

A system that emits SHA256 cannot match against a system that only has SHA1 HEMs. This is why we emit all three. When you build integrations, be explicit about which hash algorithm the destination requires, and pull that specific field from our payload.


What this means for customers

If you’re building anything on top of Leadpipe’s API, webhooks, or MCP server, HEMs are already doing the heavy lifting. You rarely have to think about them. But understanding what they are explains three things:

  1. Why match rates diverge so much across vendors. Vendors with HEM-anchored deterministic graphs clear 30-40%+ on US B2B traffic. Vendors without them don’t.
  2. Why our payload includes SHA256, SHA1, and MD5. So you can plug the same record into any downstream system without rehashing.
  3. Why we can talk about 8.7/10 accuracy out loud. Because the match is verified, not inferred.

Every plan ships with the same identity graph, 23 REST endpoints, webhooks, and a 27-tool MCP server. Start in 5 minutes