Preamble

pytest (pytest Layout, Fixtures, and Parametrization) guards behavior; formatters, linters, and type checkers guard shape; profilers guard performance claims against reality (cProfile and py-spy, memory profiling). This post walks through implementing that stack so a single Docker build (or CI job using the same image) always runs the static checks, optionally runs bounded profiling, and writes a summary you can archive next to test artifacts. It closes with Terraform patterns so infrastructure and the quality image stay reproducible—cousin to multi-stage runtime images (Multi-Stage Dockerfiles for Python).


What to run (and what “profiler” means in CI)

Formatters and import orderBlack (or Ruff’s formatter) removes formatting debates; isort (or Ruff’s import rules) keeps import diffs readable.

Fast lintingRuff covers much of what Flake8 + many plugins did, with a single binary. Flake8 remains useful if you rely on plugins Ruff does not yet subsume. pylint is heavier but catches different issues; pick one primary linter to avoid duplicate noise.

Typesmypy (or pyright as a CLI in CI) catches optional handling, container mistakes, and bad overrides. For brownfield code, mypy --strict on new packages and pragmatic # type: ignore with owners beats never turning it on.

Security and depsbandit (AST security), pip-audit or safety (dependency CVEs) belong in the same “static” bucket; they fail the build on policy, not on microseconds.

Profilers — Treat these as bounded reports, not full production captures inside every PR:

  • cProfile — deterministic; run against a representative entrypoint (import + one request, or a pytest subset marked @pytest.mark.benchmark / slow). Emit .prof and pstats text sorted by cumulative time.
  • py-spy — sampling; inside Docker it often needs --privileged or relaxed ptrace; many hosted CI sandboxes disallow it. Use it in self-hosted runners or local docker run when allowed.
  • memory_profiler — line-level memory is slow; run on one module or a small script, not the whole suite.

The build summary is a plain Markdown or JSON file listing each tool, exit code, duration, and paths to raw logs—so humans and downstream systems see one artifact.


Docker layout: quality stage separate from runtime

Use a dedicated target (or final stage) that installs dev dependencies and profiling tools; your runtime stage stays slim (multi-stage).

Conceptually:

  1. Base — Python version pinned; pip install from a lockfile (requirements.txt / poetry export / uv pip compile).
  2. quality — adds black, ruff, mypy, bandit, pip-audit, pytest, pytest-cov, profiler packages; working directory is the repo root copied in.
  3. runtime — only production deps; no linters.

The quality stage default command runs a single script that executes tools in sequence (or parallel where safe), aggregates results, and exits non-zero if any required check failed.

Example Dockerfile fragment (adapt names and paths to your repo):

# syntax=docker/dockerfile:1
FROM python:3.12-slim AS quality
WORKDIR /app
ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1
COPY requirements-dev.txt .
RUN pip install --no-cache-dir -r requirements-dev.txt
COPY . .
RUN mkdir -p /reports
ENV REPORT_DIR=/reports
CMD ["./scripts/quality_gate.sh"]

requirements-dev.txt should pin versions for reproducible CI (same pins locally via pip install -r or mount-free docker build).


The quality gate script: linters, optional profiling, one summary

Keep logic in scripts/quality_gate.sh (or make quality) so local, Docker, and Terraform-triggered builds invoke the same entrypoint.

Pattern:

  1. Start timer per tool; redirect stdout/stderr to $REPORT_DIR/<tool>.log.
  2. Record exit code in a shell variable; append a row to $REPORT_DIR/summary.md.
  3. Run formatters in check mode (black --check, ruff format --check) so CI does not mutate the tree silently—or run write mode in a pre-commit flow only.
  4. Run ruff check, mypy, bandit, pip-audit as your policy dictates.
  5. Profiling block (optional env var RUN_PROFILERS=1): e.g. python -m cProfile -o /reports/app.prof -m pytest tests/test_smoke.py -q then python -c "import pstats; p=pstats.Stats('/reports/app.prof'); p.sort_stats('cumulative'); p.print_stats(40)" > /reports/cprofile_top.txt.
  6. cat summary.md at the end for build logs.
  7. exit 1 if any required step failed.

Minimal sketch:

#!/usr/bin/env bash
set -euo pipefail
REPORT_DIR="${REPORT_DIR:-./reports}"
mkdir -p "$REPORT_DIR"
SUMMARY="$REPORT_DIR/summary.md"
echo "# Quality gate" > "$SUMMARY"
FAILED=0
run() {
  local name="$1"; shift
  local log="$REPORT_DIR/${name}.log"
  local start=$(date +%s)
  set +e
  "$@" >"$log" 2>&1
  local code=$?
  set -e
  local dur=$(($(date +%s) - start))
  echo "- **$name**: exit=$code, ${dur}s → \`$log\`" >> "$SUMMARY"
  [ "$code" -ne 0 ] && FAILED=1
}
run black black --check src tests
run ruff_fmt ruff format --check src tests
run ruff_chk ruff check src tests
run mypy mypy src
run bandit bandit -q -r src
run audit pip-audit
if [ "${RUN_PROFILERS:-0}" = 1 ]; then
  run cprofile python -m cProfile -o "$REPORT_DIR/smoke.prof" -m pytest tests/test_smoke.py -q
fi
cat "$SUMMARY"
exit "$FAILED"

Parallelism — You can run independent tools with xargs -P or GNU parallel; then merge logs and recompute FAILED. Sequential scripts are easier to debug first.

Artifacts — In GitHub Actions, GitLab CI, or CodeBuild, upload reports/ as the job artifact so the summary survives log truncation.


Invoking the gate: docker build vs docker run

Option A — Build-only: Use RUN ./scripts/quality_gate.sh in the Dockerfile quality stage so docker build --target quality . fails the image build when checks fail. The summary exists only in the build log unless you COPY --from=quality /reports into a scratch stage or use BuildKit cache mounts—more awkward for reports.

Option B — Run container (recommended for summaries): Build a quality image, then:

docker build --target quality -t myapp:quality .
docker run --rm -v "$(pwd)/reports:/reports" -e REPORT_DIR=/reports myapp:quality

The host ./reports then holds summary.md and per-tool logs—ideal for CI artifact upload.

Option C — Compose — A quality service with profiles: [ci] runs the same image for developers who do not want it on every docker compose up.


Terraform: reproducible image and optional CI project

Terraform does not replace Docker; it pins how and where the image is built and consumed.

1. Build and push with the Docker provider — If you use kreuzwerker/docker (or equivalent), you can declare an image built from a context so apply triggers rebuilds when the Dockerfile or sources change:

resource "docker_image" "app_quality" {
  name = "${var.registry}/myapp-quality:${var.image_tag}"
  build {
    context    = "${path.module}/.."
    dockerfile = "Dockerfile"
    target     = "quality"
  }
  triggers = {
    digest = filesha256("${path.module}/../scripts/quality_gate.sh")
  }
}

Use triggers on files that matter so Terraform does not skip rebuilds when only application code changed.

2. AWS CodeBuild / GCP Cloud Build — Often cleaner than Terraform driving docker build on a laptop: Terraform defines the build project, IAM, logs bucket, and artifact S3 bucket; the buildspec runs docker build / docker run or invokes the same shell script without Docker if the build environment is Python-ready. Infrastructure stays declarative; the script stays the single source of truth for which linters and profilers run.

3. Kubernetes Job — For clusters, Terraform provisions the Job manifest or Helm release; the Job image is your quality image, with emptyDir or PVC mounted at /reports and a sidecar or follow-up step to upload the summary.

Profiling and Terraform — If you need py-spy inside AWS, you will need privileged Fargate is not an option; EC2-backed CodeBuild or self-hosted runners are the realistic homes. Encode that constraint in docs next to the Terraform module so operators do not expect sampling profilers in locked-down sandboxes.


Conclusion

A containerized quality gate turns Black, Ruff, mypy, security scanners, and bounded profilers into a repeatable contract: same pins, same commands, one summary artifact. Terraform then binds that contract to registries, build projects, or batch jobs so “what runs in CI” is not tribal knowledge. For language-level profiling tradeoffs, keep cProfile and py-spy in mind when you choose what belongs in RUN_PROFILERS. Java Streams beside Python Comprehensions stays the adjacent read for shape of data in code once the tooling keeps that code readable.