| Audience | Data engineers, analytics engineers, and technical marketers |
| Prerequisites | An Identity Resolution project |
Understand how Identity Resolution groups records, applies matching rules, and maintains identity stability over time.
Overview
Identity Resolution groups input records that refer to the same real-world entity into resolved identities. It does this by building and maintaining an identity graph based on the identifiers and rules you configure.
Each resolved identity is represented by a synthetic ht_id.
- During incremental runs,
ht_idvalues remain stable unless new data causes two existing identities to merge. - When identities merge, one
ht_idsurvives and the others collapse into it. - Full re-runs rebuild the graph from scratch and do not guarantee
ht_idstability.
This page explains how Identity Resolution behaves conceptually, independent of any specific UI or setup steps.
Identity graphs
An identity graph connects records through shared identifiers.
- Input records act as nodes
- Identifiers create edges between records
- Connected groups of records form resolved identities
If two records share a matching identifier (for example, the same email), they are connected in the graph. If no valid connection exists, they remain part of separate identities.
How records connect—and which connections are allowed—is entirely determined by your configured identifiers and rules.

Deterministic matching
By default, Identity Resolution uses deterministic matching, which relies on exact identifier equality.
With deterministic matching:
- Identifier values must match exactly to connect records
- Matching behavior is predictable and explainable
- Re-running the graph with the same data and rules produces the same groupings (excluding full re-runs or rule changes)
Deterministic matching provides a stable foundation for analytics, activation, and operational workflows where correctness and auditability matter.
Identifiers and merge priority
Identifiers define how records are allowed to connect in the identity graph.
For each identifier, you configure:
- Which column contains the identifier value
- The priority order relative to other identifiers
- Optional limits on how many distinct values an identity can contain
When a record contains multiple identifiers, Identity Resolution evaluates them in priority order.
Higher-priority identifiers are applied first and can prevent lower-priority identifiers from creating unintended merges. This ordering controls how aggressively records are linked as new data arrives.

Identifier limits and over-merge protection
Identifier limits prevent identities from growing too large due to shared, reused, or low-quality identifiers.
A limit defines a rule such as:
“An identity may contain at most N distinct values of this identifier type.”
When a merge would violate a limit, Identity Resolution uses both priority and limits to determine the outcome:
-
Higher-priority identifier exceeds its limit
- The merge is blocked, and the record remains part of a separate identity.
-
Lower-priority identifier exceeds its limit
- The identity retains only the first N values.
- Additional values are ignored for matching purposes.
This behavior prevents unrelated records from collapsing into a single identity while still allowing high-confidence identifiers to drive resolution.

Incremental runs and identity stability
Identity Resolution processes data incrementally as new or updated records arrive.
During incremental runs:
- Existing identities typically retain their
ht_id - New records attach to existing identities when matches are found
- New identities are created when no valid matches exist
- Previously separate identities may merge if new data connects them
A full re-run recomputes the graph from an empty state:
- Identities may merge or split differently based on full history
ht_idvalues are not guaranteed to match previous runs
Incremental processing balances day-to-day stability with the ability to adapt as your data evolves.
From resolved identities to profiles
Identity Resolution produces resolved identities represented by ht_id. How those identities are exposed downstream depends on how you consume the outputs.
Common patterns include:
-
Golden Record
- A flattened, one-row-per-identity table
- Uses survivorship rules (recency, frequency, source priority, arrays) to select canonical values
- Often used for analytics, activation, and Customer 360 use cases
-
Customer Studio
- Uses Golden Record as the parent model (if enabled)
- Allows non-technical users to define traits, audiences, and journeys
A resolved identity exists even if Golden Record is not enabled. Golden Record and Customer Studio define how identities are represented and consumed, not how they are resolved.
