| Audience | How you’ll use this article |
|---|---|
| Data teams | Validate data structure and quality before creating an identity graph. |
| Platform admins | Identify and resolve data issues that could affect identity resolution accuracy. |
Before creating an identity graph in Hightouch, it’s important to confirm that your data is structurally ready and well-suited for identity resolution.
This article focuses on data readiness, not UI configuration. You won’t configure models, identifiers, or rules here—but you’ll learn what to check before starting setup so identity resolution behaves predictably.
What Identity Resolution expects from your data
Identity Resolution groups records from one or more models into resolved identities. To do this reliably, your data should meet a few core expectations:
- Tables are defined as models in Hightouch
- Each model has a stable primary key
- Each model includes a timestamp for incremental processing
- Identifiers are present, meaningful, and reasonably consistent
If these expectations aren’t met, identity graph setup may fail—or produce confusing or unstable results.
Model requirements
Identity Resolution only works with Hightouch models, not raw warehouse tables.
Before you begin, confirm that:
- All tables you want to use exist as models
- Model SQL is finalized and deterministic
- Primary keys are truly unique per row
Identity Resolution relies on primary keys to track records across runs. Non-unique keys can cause rows to appear unresolved or behave unpredictably.
Timestamp readiness
Every model used in identity resolution must include a timestamp column. Timestamps allow Hightouch to process new or updated records incrementally.
Use the timestamp that best represents when the record changed:
- Event models → event time (for example,
event_time) - Entity models → last updated time (for example,
updated_atorloaded_at)
Avoid timestamps that are:
- Null
- Always
CURRENT_TIMESTAMP - Unrelated to record changes
Identifier readiness (conceptual)
Identifiers are the signals Identity Resolution uses to decide whether records belong to the same real-world entity.
Before setup, it’s helpful to understand:
- Which identifiers exist across your datasets
- Which identifiers are stable vs ephemeral
- Which identifiers represent:
- A person (email, user ID)
- A device or session (anonymous ID)
- An account or organization
You’ll map identifier columns during setup, but having this context ahead of time helps you make better decisions.
You’ll select and configure identifier columns during the Add identity graph workflow. This article focuses on understanding your data—not performing setup.
Common data quality issues to watch for
Identity Resolution works best when identifiers reflect real-world entities. Before creating a graph, check for:
- Shared identifiers (for example,
support@company.com) - Recycled phone numbers or IDs
- Identifiers reused across unrelated people
- Event records without any identifiers
- Inconsistent formatting (emails, phone numbers)
These issues don’t prevent setup—but they can lead to over-merging or unexpected identity splits later.
Think in terms of entities
Before setup, decide what type of entity you’re resolving:
- Person-level (most common)
- Account-level
- Household or other custom entity
This decision affects:
- Which model you treat as your “core” entity
- Which identifiers you trust most
- How you interpret identity counts and merges
What’s next?
Once your data meets these expectations, you’re ready to configure identity resolution in Hightouch.
→ Continue to Create an identity graph