ChangelogBook a demoSign up

Probabilistic Identity Resolution

Probabilistic matching is currently in beta and must be enabled by Hightouch.

Identity resolution is only available on Business tier plans. You can use it with or without Customer Studio.
AudienceHow you’ll use this article
Data teamsEnable probabilistic matching, choose confidence levels, and review resolved outputs for an identity graph.
Platform adminsControl which confidence levels are available for audience creation and downstream activation, and ensure required infrastructure is in place.

Overview

Probabilistic matching extends deterministic identity resolution by linking records based on similarity, not just exact matches.

Deterministic matching connects records when identifiers (such as email or user_id) are exactly the same.

Probabilistic matching adds:

  • Data normalization
  • Fuzzy comparison using similarity scoring
  • Confidence tiers you can use to balance accuracy and reach

Probabilistic matching layers on top of deterministic rules. It does not replace them.


How probabilistic matching works

Probabilistic matching answers the question: “Given everything known about these two records, how likely is it that they represent the same person?”

At a high level, it works in three steps:

  1. Normalize data

    Values are cleaned and standardized so similar data can be compared reliably.

  2. Score similarity

    For each pair of candidate records, mapped identifiers are compared and given a similarity score per field.

  3. Assign a confidence level

    Field-level similarities are combined into a single confidence score for the record pair. That score is then grouped into confidence levels (Exact, Strict, Loose), which control which matches are included in outputs and available for downstream use.


When to use probabilistic matching

Use probabilistic matching when deterministic (exact) matching leaves gaps or duplicate profiles.

ScenarioExampleWhy it helps
Typos and misspellingsJohn Doe vs. Jhon DoeCatches close-but-not-exact values caused by typos or inconsistent data entry.
Multiple accounts for the same personjohndoe@gmail.com vs. john.doe@company.comLinks accounts when emails differ but other supported identifiers point to the same person.
Format variations(415) 555-1234 vs. 4155551234Standardizes supported identifiers so small differences don’t break matches.
Sparse or user-entered recordsLead forms, event RSVPs, loyalty sign-upsUses partial or inconsistent PII to infer likely matches when exact identifiers are missing.
Offline-to-online stitchingIn-store purchases → online loginsConnects offline data (name + postal code) with online profiles when there’s no shared ID.
Multiple identifiers per personName + phone + postal codeIncreases confidence by combining multiple identifiers into a stronger signal.

When identifiers are already clean and consistent and the primary key is stable (such as user_id), deterministic matching alone may be sufficient. Probabilistic matching is most valuable when reconciling messy, human-entered, or offline data.


Enable probabilistic matching

Probabilistic matching can be enabled:

  • During identity graph creation (in the Configure graph settings / Finalize step), or
  • From the Configuration tab of an existing graph

You must have external storage configured in your workspace to enable probabilistic matching. See setup instructions for: AWS, GCS, or Azure.


Option 1: Enable during graph creation

  1. Go to Identity Resolution.
  2. Click Add identity graph.
  3. Follow Create an identity graph through:
    • Select source
    • Select models
    • Configure models
    • Golden Record (optional)
    • Configure identifier rules
  4. In the Finalize step:
    • Under Enable probabilistic matching ✨, toggle probabilistic matching on.
    • In Confidence levels, select the tiers to enable (Exact, Strict, Loose).
  5. Click Finish, then run the graph.

Enable probabilistic matching during graph creation


Option 2: Enable on an existing graph

  1. Go to Identity Resolution.
  2. Open your existing identity graph.
  3. Click the Configuration tab.
  4. Under Enable probabilistic matching ✨, toggle probabilistic matching on.
  5. In Confidence levels, select one or more tiers (Exact, Strict, Loose).
  6. Click Run to reprocess the graph.

Enable probabilistic matching in the Configuration tab


Confidence levels

When probabilistic matching is enabled, one or more confidence levels can be selected.

Available tiers:

  • Exact: Includes matches that meet deterministic (exact-match) rules.
  • Strict: Includes higher-confidence similarity matches.
  • Loose: Includes lower-confidence similarity matches.

Each selected confidence level:

  • Applies a different similarity threshold when linking records
  • Filters the resolved outputs for that confidence band
  • Appears as a distinct parent model option for audience creation (for example, Exact – Golden Record, Strict – Golden Record, Loose – Golden Record)

Confidence level options shown in the parent model selector are filtered views of the same underlying Golden Record model, not separate models you need to create or manage.

Start with Exact and Strict. Add Loose only if broader reach is required and a higher risk of false positives is acceptable.

Confidence levels are visible and manageable in multiple places:

  • Identity graph Configuration tab (when enabling probabilistic matching)
  • Customer Studio → Schema → Golden Record model → Confidence levels tab
  • Audience builder → Select a parent model, where each confidence level appears as a parent model option

Confidence levels in Customer Studio schema

Parent models for each confidence level

Example use cases

TierExample use case
ExactTransactional messaging, receipts, loyalty programs, and compliance-sensitive workflows
StrictLifecycle and personalization campaigns
LoosePaid media, upper-funnel analytics, experimentation, and reach expansion

Run and validate

After enabling probabilistic matching and saving the configuration:

  1. Open the identity graph and click Run.

  2. In the Summary tab, use the Exact / Strict / Loose tabs to review how results change across tiers.

  3. Use the Profiles view to inspect individual identities:

    • Confirm that higher-confidence tiers (Exact, Strict) look correct
    • Pay special attention to Loose profiles for signs of over-merging
  4. If needed, adjust:

    • Confidence levels that are enabled
    • Which identifiers participate in probabilistic matching
    • Any workspace-level settings that affect probabilistic matching

    Then rerun the graph and review changes.

Summary by confidence level

See Review and validate matches for detailed QA workflows.

If you see over-merging (for example, profiles that contain many unrelated emails, phones, or postal codes), consider removing noisy identifiers from probabilistic matching or limiting downstream use to higher-confidence tiers.


Ready to get started?

Jump right in or a book a demo. Your first destination is always free.

Book a demoSign upBook a demo

Need help?

Our team is relentlessly focused on your success. Don't hesitate to reach out!

Feature requests?

We'd love to hear your suggestions for integrations and other features.

Privacy PolicyTerms of Service

Last updated: Feb 26, 2026

On this page
  • Overview
  • How probabilistic matching works
  • When to use probabilistic matching
  • Enable probabilistic matching
  • Option 1: Enable during graph creation
  • Option 2: Enable on an existing graph
  • Confidence levels
  • Example use cases
  • Run and validate
  • Related articles

Was this page helpful?