Preamble
Micro-benchmarks from the 2019 data-structure posts taught specific lessons—and they lie when whole-program behavior differs: inlining, cache effects, and real I/O change the hot path. Profiling keeps me honest. I use two complementary tools: cProfile for deterministic function-level attribution, and py-spy for sampling without heavy instrumentation.
cProfile: deterministic, higher overhead
cProfile records every function call (subject to profiler settings). It is ideal in tests, scripts, and staging where overhead is acceptable. I export stats (python -m cProfile -o stats.prof script.py) and visualize with snakeviz or similar—sort by cumulative time to see who called whom, not only leaf cost.
py-spy: sampling from outside
py-spy attaches to a running PID and samples the stack periodically. Overhead stays low enough for production-like load tests where inserting profiler hooks would distort timing. The view is statistical—rare paths may be underrepresented, but dominant hotspots surface quickly.
Workflow
Start with sampling to find regions; drill with time.perf_counter or cProfile around suspected functions. The same loop mindset applies when I later read Java async flame graphs—first find the continent, then the city.
Conclusion
Profiling discipline supports both Python and JVM work. JVM Startup Flags and GC Basics for Application Developers reads GC logs as the Java analogue of “where did time go?”