---
title: "Recce Cloud: Hosted Data Review Agent"
description: "Case study: building the Recce Cloud platform on AWS ECS Fargate. CDK-driven secrets pipeline, HMAC-verified Slack receiver with idempotent event handling, ephemeral container lifecycle, and coordinated major-version migrations across the stack."
doc_version: 1
last_updated: 2026-05-24
canonical: https://variable.team/projects/recce-cloud
---

<!--
  Source of truth: app/projects/recce-cloud/page.tsx. Keep the h1 in sync.
  scripts/check-markdown-drift.ts verifies this in pre-commit.
-->

# Recce Cloud

- **Company:** [Recce](https://reccehq.com)
- **Industry:** Data Engineering / Analytics Engineering
- **Dates:** Apr 2025 to Present
- **Project link:** <https://cloud.reccehq.com>
- **Stack:** AWS, CDK, Python, FastAPI, SQLAlchemy, ECS Fargate, RDS,
  Next.js, TypeScript, Slack API

### Wiring secrets from GitHub through CDK to running containers

The Ask Recce Slack bot needed an Anthropic API key, a Slack signing
secret, and a GitHub dispatch token, all delivered through a chain that
staging and prod could differ on. I wired the path end to end: GitHub
Actions secrets, CDK Python pipeline stages, ECS task definitions, container
environment variables, and the FastAPI runtime config. The pattern fails
fast when a secret is missing rather than producing the silent auth
failures that had bitten earlier integrations.

### A Slack receiver that doesn't double-process events

Slack retries events when it doesn't get an ack in time. The Recce Slack
receiver handles HMAC signature verification, strips bot mentions before
parsing, and deduplicates on the Slack event ID so re-deliveries don't kick
off a second analysis run. The receiver chains into a GitHub Actions
dispatch, passing the parsed payload through with type-safe Pydantic
models. Tests cover the missing-secret and malformed-payload cases.

### Ephemeral container orchestration for preview, share, and task instances

Recce Cloud spins up containers on demand. PR preview environments,
shareable analysis sessions, and one-off agent tasks all run the same way.
The lifecycle abstraction uses Docker locally for dev and ECS Fargate in
production, exposing one interface so application code doesn't branch on
environment. Health checks, port allocation, and lifecycle events all sit
behind the abstraction.

### Coordinating major-version migrations across the stack

The Cloud stack absorbed several breaking upgrades over the year: pnpm 10
to 11, mypy 1 to 2, Starlette 0.x to 1.0. I coordinated those across 7
GitHub workflows and 2 package manifests, validated compatibility against
frozen lock files, and rolled the changes out in chained PRs so any
regressions stayed contained to one stage at a time.

## Sitemap

[Full site index](/sitemap.md)
