My brilliant colleague Jacob Stanley has finally released his property-testing library, Hedgehog. You can grab it right now on Hackage.
I've been using Hedgehog's predecessor, Jack, for several months, and have come to rely on it for quick quality assurance. Since I've had the benefit of this extended technical preview, I thought I'd spend a few words on the ways this little library has improved my working conditions.
This post will largely concerned with retraversing property-testing basics using Hedgehog, such that a beginner or QuickCheck user can get started reasonably quickly and understand why things are the way they are. Eventually, I'll get to a more detailed post about using Hedgehog to test a typed programming language.
Hedgehog is ideal for complex generators. It has allowed me to fly by the seat of my pants, identifying capture bugs, colossal mistakes in the type checker, broken map merges, mistakes with code generation, and parser / pretty-printer ambiguities. In my opinion, it greatly improves the ergonomics of property testing, and is worth considering over existing alternatives.
If you're not familiar with property testing, now's a great time to learn. The idea is pretty simple: rather than writing unit tests, which apply some predicate to predefined points in the state space, we simply write a random generator for all good inputs, take a large sample, and apply the predicate to the entire sample.
Here's a Hedgehog version of the Hello World of property testing: reversing a list twice should produce the original list.
import qualified Data.List as List import Hedgehog import qualified Hedgehog.Gen as Gen import qualified Hedgehog.Range as Range genIntList :: Monad m => Gen m [Int] genIntList = let listLength = Range.linear 0 100000 in Gen.list listLength Gen.enumBounded prop_reverse :: Property prop_reverse = property $ do xs <- forAll genIntList List.reverse (List.reverse xs) === xs
We generate a list of
Int, where the list has anywhere from 0 to 100000 elements, and each
Int lies somewhere between that type's
maxBound as defined by the
Enum class. So,
INT_MAX. We can run this property with the
λ> check prop_reverse ✓ prop_reverse passed 100 tests.
Although it takes some practice, property testing is an effective and cheap way to gain confidence that your program works.
When a property fails, the generated value becomes a counterexample, and it must be communicated to the operator. Since it came from a random generator, though, it is likely to be a colossal and inane example. Printing it to standard output and handing it off to the operator immediately is going to ruin their life for the next few hours.
For this reason, most property-testing libraries will include some notion of shrinking, a deterministic search that reduces the counterexample until the property stops failing. The user only sees the smallest counterexample.
In Hedgehog, a reasonably good1 shrink function is constructed for free alongside every generator. Shrink functions are coupled with every generator, and a lazy tree of all possible shrinks is constructed alongside each generated value. Shrinking is then little more than traversing the tree. This is Hedgehog's theoretical point of difference from QuickCheck, though it also differs in a few practical ways.
Let's play to my strengths here and write a piece of bad code, to demonstrate how
genIntList shrinks. My bad code will be a subtly broken implementation of
reverse. We can test it against an oracle function that we know to be correct,
-- Drops an element somewhere around the middle of the list. fauxReverse :: [a] -> [a] fauxReverse xs = let sx = List.reverse xs mp = length xs `div` 2 (as, bs) = List.splitAt mp sx in as <> List.drop 1 bs prop_fauxReverse :: Property prop_fauxReverse = property $ do xs <- forAll genIntList fauxReverse xs === List.reverse xs
genIntList has a linear
Range from 0 to 100000 for its length, and the elements are generated using
enumBounded, which uses a
maxBound. When this property inevitably fails, we'd expect to see this information used to inform the counterexample search, and indeed we do:
λ> check prop_fauxReverse ✗ <interactive> failed after 2 tests and 11 shrinks. ┏━━ Hedge.hs ━━━ 30 ┃ prop_fauxReverse :: Property 31 ┃ prop_fauxReverse = 32 ┃ property $ do 33 ┃ xs <- forAll genIntList ┃ │ [ -9223372036854775808 ] 34 ┃ fauxReverse xs === List.reverse xs ┃ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ┃ │ Failed (- lhs =/= + rhs) ┃ │ -  ┃ │ + [ -9223372036854775808 ] This failure can be reproduced by running: > recheck (Size 1) (Seed 6043815965182080260 5082204861531945465) <property> False λ> minBound :: Int -9223372036854775808
Generators made out of
Hedgehog.Range are constructive;
Range contains enough information to generate a correct value every time, and each Range implies a shrink step and gradient. Ranges are good!
We might like to write a more precise property to protect against dropped elements. A reverse function should preserve every element in the list!
λ> check prop_fauxReverse_length ✗ <interactive> failed after 2 tests and 1 shrink. ┏━━ Hedge.hs ━━━ 36 ┃ prop_fauxReverse_length :: Property 37 ┃ prop_fauxReverse_length = 38 ┃ property $ do 39 ┃ xs <- forAll genIntList ┃ │ [ -9223372036854775808 ] 40 ┃ length (fauxReverse xs) === length xs ┃ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ┃ │ Failed (- lhs =/= + rhs) ┃ │ - 0 ┃ │ + 1 This failure can be reproduced by running: > recheck (Size 1) (Seed 4416742318514417762 31116095073768267) <property> False
Range helps us construct generators in terms of bounds and steps, we may also wish to write generators in terms of negative invariants. It may be simpler to generate terms, apply some predicates to them, and discard the invalid ones. This gives us the flexibility to write much more complicated generators, though they may take longer to run, or diverge.
Generators that discard heavily are certain to be less productive than other constructive generators. They can generate an arbitrary number of invalid values for each valid one, at the expense of your CPU cycles and precious seconds you probably needed for your life. So, this is a tool to be used gently and infrequently.
In practice, many invariants can be expressed constructively in terms of ranges and bind. Regardless, negative generators that discard are necessary when a constructive generator is not feasible.
In QuickCheck (and Jack) we'd use the
suchThat combinator for this.
MonadPlus, and the user is free to use
asum to indicate failure. There are also a few helpful named wrappers and functions over these instances, like
Gen.filter (which is similar to
discard. Discarding is a first-class notion in Hedgehog, and invariants enforced this way are also enforced during shrinking.
discard all the time, usually during large
asum blocks. A good example is in Hedgehog's STLC example ; if no values of a certain type are currently bound in scope, the known-value generator fails, and the
choice block higher up will generate a literal instead.
Here's a quite vacuous example: a generator for sized lists of unique elements. To provide a better demonstration, I'll use some fairly constraining invariants: elements must be valued between 0 and 10, and the list's length must be between 0 and 10:
genUniqueIntList :: Monad m => Gen m [Int] genUniqueIntList = do let listLength = Range.linear 0 10 genInt = Gen.int (Range.linear 0 10) xs <- Gen.list listLength genInt if List.nub xs == xs then pure xs else Gen.discard -- empty / mzero
Gen.sample to inspect some random values from this generator, we see that it works:
λ> Gen.sample genUniqueIntList [,,,,,[3,0],] λ> Gen.sample genUniqueIntList [[3,0],,,,[3,1],[3,0],[0,1],,[2,1,3]] λ> Gen.sample genUniqueIntList [[3,1],[1,3,2],[3,1,2],,[2,0]] λ> Gen.sample genUniqueIntList [,[0,2],[0,2],[0,1,3],]
If you were to run the generator just once and inspect its shrinks using
Gen.print, you would observe that this generator fails nearly all the time. We get slightly better results when rewritten with
Gen.filter, which scales the ranges behind the scenes:
genUniqueIntList :: Monad m => Gen m [Int] genUniqueIntList = do let listLength = Range.linear 0 10 genInt = Gen.int (Range.linear 0 10) Gen.filter (\xs -> List.nub xs == xs) (Gen.list listLength genInt)
λ> Gen.print genUniqueIntList === Outcome === [1,2] === Shrinks ===    [0,2] [1,0] <discard>
Hedgehog.Gen contains useful combinators for various containers, including maps and sets. These are pretty handy, and much more convenient than rolling it yourself:
λ> Gen.sample (Gen.set (Range.linear 0 10) (Gen.int (Range.linear 0 10))) [ fromList  , fromList [ 3 ] , fromList [ 3 ] , fromList [ 0 , 3 ] , fromList [ 0 , 2 ] , fromList [ 0 , 2 ] , fromList [ 3 ] , fromList [ 0 ] , fromList  , fromList  ]
In short, discarding in Hedgehog is easy, it still leads to meaningful and correct shrinking, and there are some helper functions in
Hedgehog.Gen you should consult with before doing it yourself. As with QuickCheck, keep an eye on the values coming out of your generator, as discarding can negatively impact coverage.
Note that shrinking exists in QuickCheck, and is reasonably good. Every
Arbitrary instance carries a
shrink :: a -> [a] function. There's also
forAllShrink if you wish to avoid
Arbitrary and all its orphan works. All the
Arbitrary instances in popular use come with shrinking functions. However, since shrinking is not fundamental to
Gen, we can't compose generators without also manually constructing a shrink function. If the generator has few invariants, we may get away with
genericShrink, but in practice engineers often opt to forego shrinking altogether when working with complicated generators.
Hedgehog supports custom shrinking in the QuickCheck style via
Hedgehog's predecessor, Jack, was built as a QuickCheck utility library. It reused QuickCheck's runner and notion of
Property, and leaned on the wealth of generators out there on Hackage.
QuickCheck is beautiful, long-lived, and well-maintained, and I've used it a lot to great benefit. It's just a relatively old codebase, with lots of subtle niggles.
Hedgehog does not depend on QuickCheck. Starting from scratch with a new dependency base and no backwards-compatibility requirements allows a few fun improvements:
Hedgehog allows first-class effectful generators and effectful tests. The user must track their effects in the type, as you would in ordinary Haskell. There's no
unsafePerformIO under the hood.
The type of generators,
Gen, forms a monad transformer. It has instances for all the classes you'd expect, like
MonadTrans. It also has MTL-style instances,
MonadState and the like. When those are insufficient, it implements
MFunctor, so you can
hoist your way to victory.
Here's a pretty silly IO generator:
import Data.Text (Text) import qualified Data.Text as T import qualified Data.Text.IO as T import Hedgehog import qualified Hedgehog.Gen as Gen import qualified Hedgehog.Range as Range -- Generate a dictionary word by sampling from words. -- Don't actually do this. genWord :: Gen IO Text genWord = do ws <- T.lines <$> liftIO (T.readFile "/usr/share/dict/words") Gen.element ws
λ> Gen.sample genWord ["tenoner","unsophistically","suggest","catechistic","podoscapher","seallike","colonoscopy","Gemara","eosinophilic","monomyarian"]
It will be interesting to see what kind of effectful generators people find useful. I've mostly been using this transformer redundantly, managing environments with
ReaderT to clean up complicated code. I can't think of a use for an IO generator, but nor can I think of a reason they should be prohibited.
The type of tests,
Test, also forms a monad transformer with the similar instances to
To write an IO test, we simply use
liftIO to our heart's content:
prop_got_words :: Property prop_got_words = property $ do b <- liftIO (doesFileExist "/usr/share/dict/words") assert b
To test functions in
liftExceptT functions are provided. These use
pretty-show on the error, and use GHC call stacks to isolate the line of code responsible for the failure:
prop_got_words_except :: Property prop_got_words_except = property $ liftExceptT gotWords gotWords :: MonadIO m => ExceptT Text m () gotWords = do b <- liftIO (doesFileExist "/usr/share/dict/words") if b then pure () else throwError "don't got words"
The keen observer will notice that
property :: Test IO () -> Property requires the user to finagle their monad stack into
IO before it can be run via
mmorph. This is one place the interface could be improved! Hedgehog is just getting started.
In addition to
checkSequential, which behaves much like
quickCheckAll, Hedgehog supplies
checkConcurrent. It uses
concurrent-output to wrangle the terminal.
testsSeq :: IO Bool testsSeq = checkSequential $$(discover) testsCon :: IO Bool testsCon = checkConcurrent $$(discover)
Note that concurrently checking IO properties may lead to nondeterministic failure. The user must identify and sequentialise tests that cannot be run concurrently.
The API for this is in slight flux at the time of writing, so please check out the examples and documentation for more.
There is nothing worse than recreating a complex counterexample from stdout. Lots of libraries (including
Show instances that perform pretty-printing, rather than the derived version, and these can't be easily pasted into GHCi. If you're bad at writing manual shrink functions, a QuickCheck counterexample will often be huge, perhaps larger than your clipboard buffer! This is a bad place to be.
When a property fails, Hedgehog will print the seed used by the random number generator. This seed is portable across platforms, and can be pasted directly into GHCi:
This failure can be reproduced by running: > recheck (Size 1) (Seed 861472791889809000 7503307783496352651) prop_fauxReverse_length
Hedgehog uses the new call stack API in GHC 8 to great effect when reporting property failures. Most combinators carry a
HasCallStack constraint. This enables the inline display of generated values:
✗ prop_fauxReverse_length failed after 2 tests and 19 shrinks. ┏━━ Hedge.hs ━━━ 48 ┃ prop_fauxReverse_length :: Property 49 ┃ prop_fauxReverse_length = 50 ┃ property $ do 51 ┃ xs <- forAll genIntList ┃ │ [ -9223372036854775808 ┃ │ , -9223372036854775808 ┃ │ , -9223372036854775808 ┃ │ , -9223372036854775808 ┃ │ , -9223372036854775808 ┃ │ , -9223372036854775808 ┃ │ , -9223372036854775808 ┃ │ , -9223372036854775808 ┃ │ , -9223372036854775808 ┃ │ , -9223372036854775808 ┃ │ ] 52 ┃ length (fauxReverse xs) === length xs ┃ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ┃ │ Failed (- lhs =/= + rhs) ┃ │ - 9 ┃ │ + 10 This failure can be reproduced by running: > recheck (Size 1) (Seed 861472791889809000 7503307783496352651) prop_fauxReverse_length
Hedgehog depends on
pretty-show, and uses it to print counterexamples where possible. The
Show constraints on various combinators like
liftExceptT are used in this way. You may still get a wall of text if your counterexample doesn't reduce well, but it'll be a heck of a lot neater.
This is a simple trick we can extend to regular ol' QuickCheck, by simply reimplementing
(===). You should absolutely do this. Life changing.
This is a minor aesthetic difference, but Hedgehog has no equivalent to the
Arbitrary typeclass. Since test frameworks typically live outside of a project's dependency tree, and are never in
base, reliance on a typeclass to provide test coverage creates a slightly awkward situation. Users have to choose between orphan instances and spurious dependencies.
Arbitrary is a little bit questionable anyway, since it is a lawless class used mostly for overloading. Living without it is fairly easy; we just give all our generators names, and summon them by name using
forAll. It does come at a price: we lose the pretty
prop_abc x y z = ... shorthand.
As noted on Reddit by Koen Claessen and previously on the QuickCheck issue tracker, the integrated approach is not perfect. In a monadic generator, terms depend on one another, and thus need to be shrunk in their bound order. This is something like a depth-first search of the counterexample space, halting at a local minima. Hedgehog provides invariant-preserving shrinks for free, but does not necessarily produce better shrinks than QuickCheck. Adapting Hedgehog's integrated shrinking for alternative search procedures is something of an open question.↩