ChangelogBook a demoSign up

Set up Probabilistic IDR (Amazon S3)

Overview

Probabilistic matching does some of its work outside your data warehouse, so it needs a place in your cloud to store data during the process. Hightouch uses it as a workspace during each graph run — reading your source records, running the matching process, and writing results back to your data warehouse. All data stays in your cloud account and under your control.

Some data persists in the bucket between runs so Hightouch doesn't have to start from scratch each time.


Step 1: Create an S3 bucket

Create an Amazon S3 bucket in the same cloud provider and region as your Hightouch workspace.


Step 2: Set up a cross-account IAM role

Create a cross-account IAM role using the AWS account ID and external ID that the Hightouch app provides when you go to Settings → Cloud Providers and add a new credential.


Step 3: Share role and bucket details with Hightouch

In addition to entering the role ARN and bucket name in the Hightouch UI, share both values with your Hightouch team.


Step 4: Attach the required IAM policy to the role

Attach the following policy to the role, replacing <bucket_name> with your actual bucket name:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:GetObjectVersion",
        "s3:DeleteObject",
        "s3:DeleteObjectVersion"
      ],
      "Resource": "arn:aws:s3:::<bucket_name>/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": "arn:aws:s3:::<bucket_name>",
      "Condition": {
        "StringLike": {
          "s3:prefix": ["*"]
        }
      }
    }
  ]
}

Remember to replace the <bucket_name> placeholder.


Step 5: Set the trust policy for the role

Update the role’s trust policy to the following, replacing all placeholders:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "<existing ARN from step 2>",
          "<warehouseRoleArn>"
        ]
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": [
            "<existing ExternalId from step 2>",
            "<warehouseExternalId>"
          ]
        }
      }
    }
  ]
}

Your Hightouch team can provide the warehouseRoleArn and warehouseExternalId values.


Step 6: Configure the S3 bucket in Hightouch

Configure the S3 bucket in the Hightouch UI using your bucket and role, following the Amazon S3 external storage documentation.

Important: Probabilistic IDR has different storage requirements than standard external storage:

  • Do not set object lifecycle rules. Probabilistic IDR requires data to persist between runs.
  • Do not scope paths to a specific workspace directory.

Ready to get started?

Jump right in or a book a demo. Your first destination is always free.

Book a demoSign upBook a demo

Need help?

Our team is relentlessly focused on your success. Don't hesitate to reach out!

Feature requests?

We'd love to hear your suggestions for integrations and other features.

Privacy PolicyTerms of Service

Last updated: Mar 20, 2026

On this page
  • Overview
  • Step 1: Create an S3 bucket
  • Step 2: Set up a cross-account IAM role
  • Step 3: Share role and bucket details with Hightouch
  • Step 4: Attach the required IAM policy to the role
  • Step 5: Set the trust policy for the role
  • Step 6: Configure the S3 bucket in Hightouch

Was this page helpful?