Preamble
pytest (pytest Layout, Fixtures, and Parametrization) guards behavior; formatters, linters, and type checkers guard shape; profilers guard performance claims against reality (cProfile and py-spy, memory profiling). This post walks through implementing that stack so a single Docker build (or CI job using the same image) always runs the static checks, optionally runs bounded profiling, and writes a summary you can archive next to test artifacts. It closes with Terraform patterns so infrastructure and the quality image stay reproducible—cousin to multi-stage runtime images (Multi-Stage Dockerfiles for Python).
What to run (and what “profiler” means in CI)
Formatters and import order — Black (or Ruff’s formatter) removes formatting debates; isort (or Ruff’s import rules) keeps import diffs readable.
Fast linting — Ruff covers much of what Flake8 + many plugins did, with a single binary. Flake8 remains useful if you rely on plugins Ruff does not yet subsume. pylint is heavier but catches different issues; pick one primary linter to avoid duplicate noise.
Types — mypy (or pyright as a CLI in CI) catches optional handling, container mistakes, and bad overrides. For brownfield code, mypy --strict on new packages and pragmatic # type: ignore with owners beats never turning it on.
Security and deps — bandit (AST security), pip-audit or safety (dependency CVEs) belong in the same “static” bucket; they fail the build on policy, not on microseconds.
Profilers — Treat these as bounded reports, not full production captures inside every PR:
cProfile— deterministic; run against a representative entrypoint (import + one request, or a pytest subset marked@pytest.mark.benchmark/ slow). Emit.profandpstatstext sorted by cumulative time.py-spy— sampling; inside Docker it often needs--privilegedor relaxedptrace; many hosted CI sandboxes disallow it. Use it in self-hosted runners or localdocker runwhen allowed.memory_profiler— line-level memory is slow; run on one module or a small script, not the whole suite.
The build summary is a plain Markdown or JSON file listing each tool, exit code, duration, and paths to raw logs—so humans and downstream systems see one artifact.
Docker layout: quality stage separate from runtime
Use a dedicated target (or final stage) that installs dev dependencies and profiling tools; your runtime stage stays slim (multi-stage).
Conceptually:
- Base — Python version pinned;
pip installfrom a lockfile (requirements.txt/poetry export/uv pip compile). - quality — adds
black,ruff,mypy,bandit,pip-audit,pytest,pytest-cov, profiler packages; working directory is the repo root copied in. - runtime — only production deps; no linters.
The quality stage default command runs a single script that executes tools in sequence (or parallel where safe), aggregates results, and exits non-zero if any required check failed.
Example Dockerfile fragment (adapt names and paths to your repo):
# syntax=docker/dockerfile:1
FROM python:3.12-slim AS quality
WORKDIR /app
ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1
COPY requirements-dev.txt .
RUN pip install --no-cache-dir -r requirements-dev.txt
COPY . .
RUN mkdir -p /reports
ENV REPORT_DIR=/reports
CMD ["./scripts/quality_gate.sh"]
requirements-dev.txt should pin versions for reproducible CI (same pins locally via pip install -r or mount-free docker build).
The quality gate script: linters, optional profiling, one summary
Keep logic in scripts/quality_gate.sh (or make quality) so local, Docker, and Terraform-triggered builds invoke the same entrypoint.
Pattern:
- Start timer per tool; redirect stdout/stderr to
$REPORT_DIR/<tool>.log. - Record exit code in a shell variable; append a row to
$REPORT_DIR/summary.md. - Run formatters in check mode (
black --check,ruff format --check) so CI does not mutate the tree silently—or run write mode in a pre-commit flow only. - Run ruff check, mypy, bandit, pip-audit as your policy dictates.
- Profiling block (optional env var
RUN_PROFILERS=1): e.g.python -m cProfile -o /reports/app.prof -m pytest tests/test_smoke.py -qthenpython -c "import pstats; p=pstats.Stats('/reports/app.prof'); p.sort_stats('cumulative'); p.print_stats(40)" > /reports/cprofile_top.txt. cat summary.mdat the end for build logs.exit 1if any required step failed.
Minimal sketch:
#!/usr/bin/env bash
set -euo pipefail
REPORT_DIR="${REPORT_DIR:-./reports}"
mkdir -p "$REPORT_DIR"
SUMMARY="$REPORT_DIR/summary.md"
echo "# Quality gate" > "$SUMMARY"
FAILED=0
run() {
local name="$1"; shift
local log="$REPORT_DIR/${name}.log"
local start=$(date +%s)
set +e
"$@" >"$log" 2>&1
local code=$?
set -e
local dur=$(($(date +%s) - start))
echo "- **$name**: exit=$code, ${dur}s → \`$log\`" >> "$SUMMARY"
[ "$code" -ne 0 ] && FAILED=1
}
run black black --check src tests
run ruff_fmt ruff format --check src tests
run ruff_chk ruff check src tests
run mypy mypy src
run bandit bandit -q -r src
run audit pip-audit
if [ "${RUN_PROFILERS:-0}" = 1 ]; then
run cprofile python -m cProfile -o "$REPORT_DIR/smoke.prof" -m pytest tests/test_smoke.py -q
fi
cat "$SUMMARY"
exit "$FAILED"
Parallelism — You can run independent tools with xargs -P or GNU parallel; then merge logs and recompute FAILED. Sequential scripts are easier to debug first.
Artifacts — In GitHub Actions, GitLab CI, or CodeBuild, upload reports/ as the job artifact so the summary survives log truncation.
Invoking the gate: docker build vs docker run
Option A — Build-only: Use RUN ./scripts/quality_gate.sh in the Dockerfile quality stage so docker build --target quality . fails the image build when checks fail. The summary exists only in the build log unless you COPY --from=quality /reports into a scratch stage or use BuildKit cache mounts—more awkward for reports.
Option B — Run container (recommended for summaries): Build a quality image, then:
docker build --target quality -t myapp:quality .
docker run --rm -v "$(pwd)/reports:/reports" -e REPORT_DIR=/reports myapp:quality
The host ./reports then holds summary.md and per-tool logs—ideal for CI artifact upload.
Option C — Compose — A quality service with profiles: [ci] runs the same image for developers who do not want it on every docker compose up.
Terraform: reproducible image and optional CI project
Terraform does not replace Docker; it pins how and where the image is built and consumed.
1. Build and push with the Docker provider — If you use kreuzwerker/docker (or equivalent), you can declare an image built from a context so apply triggers rebuilds when the Dockerfile or sources change:
resource "docker_image" "app_quality" {
name = "${var.registry}/myapp-quality:${var.image_tag}"
build {
context = "${path.module}/.."
dockerfile = "Dockerfile"
target = "quality"
}
triggers = {
digest = filesha256("${path.module}/../scripts/quality_gate.sh")
}
}
Use triggers on files that matter so Terraform does not skip rebuilds when only application code changed.
2. AWS CodeBuild / GCP Cloud Build — Often cleaner than Terraform driving docker build on a laptop: Terraform defines the build project, IAM, logs bucket, and artifact S3 bucket; the buildspec runs docker build / docker run or invokes the same shell script without Docker if the build environment is Python-ready. Infrastructure stays declarative; the script stays the single source of truth for which linters and profilers run.
3. Kubernetes Job — For clusters, Terraform provisions the Job manifest or Helm release; the Job image is your quality image, with emptyDir or PVC mounted at /reports and a sidecar or follow-up step to upload the summary.
Profiling and Terraform — If you need py-spy inside AWS, you will need privileged Fargate is not an option; EC2-backed CodeBuild or self-hosted runners are the realistic homes. Encode that constraint in docs next to the Terraform module so operators do not expect sampling profilers in locked-down sandboxes.
Conclusion
A containerized quality gate turns Black, Ruff, mypy, security scanners, and bounded profilers into a repeatable contract: same pins, same commands, one summary artifact. Terraform then binds that contract to registries, build projects, or batch jobs so “what runs in CI” is not tribal knowledge. For language-level profiling tradeoffs, keep cProfile and py-spy in mind when you choose what belongs in RUN_PROFILERS. Java Streams beside Python Comprehensions stays the adjacent read for shape of data in code once the tooling keeps that code readable.