How Identity Resolution works

Identity resolution is only available on Business tier plans. You can use it with or without Customer Studio.


Audience	Data engineers, analytics engineers, and technical marketers
Prerequisites	An Identity Resolution project

Understand how Identity Resolution groups records, applies matching rules, and maintains identity stability over time.

Overview

Identity Resolution groups input records that refer to the same real-world entity into resolved identities. It does this by building and maintaining an identity graph based on the identifiers and rules you configure.

Each resolved identity is represented by a synthetic ht_id.

During incremental runs, ht_id values remain stable unless new data causes two existing identities to merge.
When identities merge, one ht_id survives and the others collapse into it.
Full re-runs rebuild the graph from scratch and do not guarantee ht_id stability.

This page explains how Identity Resolution behaves conceptually, independent of any specific UI or setup steps.

Explore the interactive architecture diagram →

See how data flows and where Identity Resolution fits in your stack.

Identity graphs

An identity graph connects records through shared identifiers.

Input records act as nodes
Identifiers create edges between records
Connected groups of records form resolved identities

If two records share a matching identifier (for example, the same email), they are connected in the graph. If no valid connection exists, they remain part of separate identities.

How records connect—and which connections are allowed—is entirely determined by your configured identifiers and rules.

Deterministic matching

By default, Identity Resolution uses deterministic matching, which relies on exact identifier equality.

With deterministic matching:

Identifier values must match exactly to connect records
Matching behavior is predictable and explainable
Re-running the graph with the same data and rules produces the same groupings (excluding full re-runs or rule changes)

Deterministic matching provides a stable foundation for analytics, activation, and operational workflows where correctness and auditability matter.

Identifiers and merge priority

Identifiers define how records are allowed to connect in the identity graph.

For each identifier, you configure:

Which column contains the identifier value
The priority order relative to other identifiers
Optional limits on how many distinct values an identity can contain

When a record contains multiple identifiers, Identity Resolution evaluates them in priority order.

Higher-priority identifiers are applied first and can prevent lower-priority identifiers from creating unintended merges. This ordering controls how aggressively records are linked as new data arrives.

Identifier limits and over-merge protection

Identifier limits prevent identities from growing too large due to shared, reused, or low-quality identifiers.

A limit defines a rule such as:

“An identity may contain at most N distinct values of this identifier type.”

When a merge would violate a limit, Identity Resolution uses both priority and limits to determine the outcome:

Higher-priority identifier exceeds its limit
- The merge is blocked, and the record remains part of a separate identity.
Lower-priority identifier exceeds its limit
- The identity retains only the first N values.
- Additional values are ignored for matching purposes.

This behavior prevents unrelated records from collapsing into a single identity while still allowing high-confidence identifiers to drive resolution.

Incremental runs and identity stability

Identity Resolution processes data incrementally as new or updated records arrive. Each input model's timestamp column determines which rows are net new and need to be processed.

During incremental runs:

Existing identities typically retain their ht_id
New records attach to existing identities when matches are found
New identities are created when no valid matches exist
Previously separate identities may merge if new data connects them

When a merge happens, the lower-numbered ht_id absorbs the higher-numbered one. For example, if new data links ht10 and ht14, the two identities collapse into ht10 and ht14 is dropped. The dropped ht_id is no longer present in the next run's output, but all of its records appear on the surviving identity. This is expected behavior — it reflects that Identity Resolution has learned the two profiles refer to the same entity.

Full re-runs

A full re-run recomputes the graph from an empty state. This happens when:

You make an identifier limit rule stricter — for example, lowering the threshold on an existing limit, or adding a new limit rule
A large amount of bad data requires resetting the graph. Smaller corrections can usually be handled with the profile reprocessing API instead, which avoids a full re-run

After a full re-run:

ht_id values are not guaranteed to match previous runs
A given ht_id value from yesterday's graph may be assigned to a completely different real-world entity today

The practical implication

For day-to-day incremental runs, ht_id values are stable. The one common change is when new or updated data shows that two previously distinct identities are actually the same — one ht_id survives, the other is dropped, and its records show up on the surviving identity. This is expected and intentional.

A full re-run is different: it reassigns ht_id values from scratch. If a downstream system stores ht_id as a primary key, you'd need to re-sync every record after a full re-run to keep that system consistent with the warehouse. For activation guidance, see Sync and activation guidance.

From resolved identities to profiles

Identity Resolution produces resolved identities represented by ht_id. How those identities are exposed downstream depends on how you consume the outputs.

Common patterns include:

Golden Record
- A flattened, one-row-per-identity table
- Uses survivorship rules (recency, frequency, source priority, arrays) to select canonical values
- Often used for analytics, activation, and Customer 360 use cases
Customer Studio
- Uses Golden Record as the parent model
- Allows non-technical users to define traits, audiences, and journeys

A resolved identity exists independently of Golden Record. Golden Record and Customer Studio define how identities are represented and consumed, not how they are resolved.