Setting up probabilistic IDR requires a few one-time steps in your AWS account, along with additional setup completed by Hightouch.
Step 1: Create a custom IAM role
Create a new role in your GCP project with the following permissions:
storage.buckets.getstorage.objects.liststorage.objects.createstorage.objects.getstorage.objects.delete
gcloud iam roles create YOUR_CUSTOM_ROLE_NAME \
--project=YOUR_PROJECT_ID \
--title="YOUR CUSTOM ROLE NAME" \
--description="Custom role for Hightouch probabilistic IDR" \
--permissions="storage.buckets.get,storage.objects.list,storage.objects.create,storage.objects.get,storage.objects.delete"
Step 2: Grant the role to the Hightouch service account
Hightouch will provide a service account specifically for probabilistic IDR.
Once you have the service account email, grant it the custom role:
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member=serviceAccount:HT_PROVIDED_SERVICE_ACCOUNT@YOUR_DOMAIN.iam.gserviceaccount.com \
--role=YOUR_CUSTOM_ROLE_NAME
Step 3: Create a GCS bucket
Create a GCS bucket with the following requirements:
- Located in the same cloud provider and region as your Hightouch workspace
- Must not have object lifecycle rules that delete or expire objects in the following path:
/workspace-$WORKSPACE_ID/datalake
Step 4: Share the bucket name with Hightouch
Share the bucket name with your Hightouch team so they can complete setup on their side.
Step 5: Grant bucket access to the BigQuery service account
Grant the same custom IAM role from Step 1 to the BigQuery service account used by Hightouch in your project.
This ensures Hightouch can read and write IDR data as part of warehouse workflows.
FAQs
Can I reuse an existing GCS bucket?
Yes. You can use the same GCS bucket you use for self-hosted external storage, as long as the bucket does not have lifecycle rules that delete objects in the required path.
Note that the service account used for probabilistic IDR is different from the service account generated by the Hightouch app or one you may already use.