Preamble

Micro-benchmarks from the 2019 data-structure posts taught specific lessons—and they lie when whole-program behavior differs: inlining, cache effects, and real I/O change the hot path. Profiling keeps me honest. I use two complementary tools: cProfile for deterministic function-level attribution, and py-spy for sampling without heavy instrumentation.


cProfile: deterministic, higher overhead

cProfile records every function call (subject to profiler settings). It is ideal in tests, scripts, and staging where overhead is acceptable. I export stats (python -m cProfile -o stats.prof script.py) and visualize with snakeviz or similar—sort by cumulative time to see who called whom, not only leaf cost.


py-spy: sampling from outside

py-spy attaches to a running PID and samples the stack periodically. Overhead stays low enough for production-like load tests where inserting profiler hooks would distort timing. The view is statistical—rare paths may be underrepresented, but dominant hotspots surface quickly.


Workflow

Start with sampling to find regions; drill with time.perf_counter or cProfile around suspected functions. The same loop mindset applies when I later read Java async flame graphs—first find the continent, then the city.


Conclusion

Profiling discipline supports both Python and JVM work. JVM Startup Flags and GC Basics for Application Developers reads GC logs as the Java analogue of “where did time go?”