Preamble
Example-driven tests—whether pytest parametrization or hand-picked tables—only guard the cases you thought to write. Hypothesis turns that around: you state a property that should hold for many inputs, and the library searches for counterexamples. When it finds one, shrinking reduces the failure to a minimal case you can actually debug.
I use Hypothesis beside normal tests, not instead of them. Unit examples document intent; properties document invariants the architecture must obey.
Where properties shine
Parsers and serializers
Round-trip laws (decode(encode(x)) ≈ x) catch surprising Unicode, empty strings, and boundary lengths.
Pure functions with laws
Sorting, merging intervals, monetary calculations—anything with algebraic structure is a natural fit.
Stateful systems
Hypothesis can model sequences of operations (append, pop, balance) against a reference implementation. That is closer to integration testing; worth the setup when bugs are expensive.
Generators and custom strategies
Stock strategies cover many builtins; composite strategies encode “valid user” shapes that mirror production constraints better than text(). The goal is not maximal randomness—it is representative diversity.
Shrinking as a teaching tool
A 200-character failing string that shrinks to "[" teaches faster than a wall of hex. I read shrunk failures like compiler errors: fix the smallest law break first.
Conclusion
Property tests document laws, not single scenarios. They pair with Mockito and unittest.mock: Boundaries in Tests’s mock discipline and 2024’s architecture tests: keep I/O at the edge, keep invariants in the middle.