Setting up probabilistic IDR requires a few one-time steps in your AWS account, along with additional setup completed by Hightouch.
Step 1: Create an S3 bucket
Create an Amazon S3 bucket in the same cloud provider and region as your Hightouch workspace.
Step 2: Set up a cross-account IAM role
Create a cross-account IAM role using the AWS account ID and external ID that the Hightouch app provides when you go to Settings → Cloud Providers and add a new credential.
Step 3: Share role and bucket details with Hightouch
In addition to entering the role ARN and bucket name in the Hightouch UI, share both values with your Hightouch team.
Step 4: Attach the required IAM policy to the role
Attach the following policy to the role, replacing <bucket_name> with your actual bucket name:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:DeleteObject",
"s3:DeleteObjectVersion"
],
"Resource": "arn:aws:s3:::<bucket_name>/*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::<bucket_name>",
"Condition": {
"StringLike": {
"s3:prefix": ["*"]
}
}
}
]
}
Remember to replace the <bucket_name> placeholder.
Step 5: Set the trust policy for the role
Update the role’s trust policy to the following, replacing all placeholders:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"<existing ARN from step 2>",
"<warehouseRoleArn>"
]
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": [
"<existing ExternalId from step 2>",
"<warehouseExternalId>"
]
}
}
}
]
}
Your Hightouch team can provide the warehouseRoleArn and warehouseExternalId values.
Step 6: Configure the S3 bucket in Hightouch
Configure the S3 bucket in the Hightouch UI using your bucket and role, following the Amazon S3 external storage documentation.
Important: Probabilistic IDR has different storage requirements than standard external storage:
- Do not set object lifecycle rules. Probabilistic IDR requires data to persist between runs.
- Do not scope paths to a specific workspace directory.