Overview
Probabilistic matching does some of its work outside your data warehouse, so it needs a place in your cloud to store data during the process. Hightouch uses it as a workspace during each graph run — reading your source records, running the matching process, and writing results back to your data warehouse. All data stays in your cloud account and under your control.
Some data persists in the bucket between runs so Hightouch doesn't have to start from scratch each time.
Step 1: Create an S3 bucket
Create an Amazon S3 bucket in the same cloud provider and region as your Hightouch workspace.
Step 2: Set up a cross-account IAM role
Create a cross-account IAM role using the AWS account ID and external ID that the Hightouch app provides when you go to Settings → Cloud Providers and add a new credential.
Step 3: Share role and bucket details with Hightouch
In addition to entering the role ARN and bucket name in the Hightouch UI, share both values with your Hightouch team.
Step 4: Attach the required IAM policy to the role
Attach the following policy to the role, replacing <bucket_name> with your actual bucket name:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:DeleteObject",
"s3:DeleteObjectVersion"
],
"Resource": "arn:aws:s3:::<bucket_name>/*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::<bucket_name>",
"Condition": {
"StringLike": {
"s3:prefix": ["*"]
}
}
}
]
}
Remember to replace the <bucket_name> placeholder.
Step 5: Set the trust policy for the role
Update the role’s trust policy to the following, replacing all placeholders:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"<existing ARN from step 2>",
"<warehouseRoleArn>"
]
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": [
"<existing ExternalId from step 2>",
"<warehouseExternalId>"
]
}
}
}
]
}
Your Hightouch team can provide the warehouseRoleArn and warehouseExternalId values.
Step 6: Configure the S3 bucket in Hightouch
Configure the S3 bucket in the Hightouch UI using your bucket and role, following the Amazon S3 external storage documentation.
Important: Probabilistic IDR has different storage requirements than standard external storage:
- Do not set object lifecycle rules. Probabilistic IDR requires data to persist between runs.
- Do not scope paths to a specific workspace directory.