Property testing with Hedgehog

Published: 2017-04-23
Words: 2681

My brilliant colleague Jacob Stanley has finally released his property-testing library, Hedgehog. You can grab it right now on Hackage.

I've been using Hedgehog's predecessor, Jack, for several months, and have come to rely on it for quick quality assurance. Since I've had the benefit of this extended technical preview, I thought I'd spend a few words on the ways this little library has improved my working conditions.

This post will largely concerned with retraversing property-testing basics using Hedgehog, such that a beginner or QuickCheck user can get started reasonably quickly and understand why things are the way they are. Eventually, I'll get to a more detailed post about using Hedgehog to test a typed programming language.

Hedgehog is ideal for complex generators. It has allowed me to fly by the seat of my pants, identifying capture bugs, colossal mistakes in the type checker, broken map merges, mistakes with code generation, and parser / pretty-printer ambiguities. In my opinion, it greatly improves the ergonomics of property testing, and is worth considering over existing alternatives.

Property testing

If you're not familiar with property testing, now's a great time to learn. The idea is pretty simple: rather than writing unit tests, which apply some predicate to predefined points in the state space, we simply write a random generator for all good inputs, take a large sample, and apply the predicate to the entire sample.

Here's a Hedgehog version of the Hello World of property testing: reversing a list twice should produce the original list.

import qualified Data.List as List
import           Hedgehog
import qualified Hedgehog.Gen as Gen
import qualified Hedgehog.Range as Range

genIntList :: Monad m => Gen m [Int]
genIntList =
  let listLength = Range.linear 0 100000
  in  Gen.list listLength Gen.enumBounded

prop_reverse :: Property
prop_reverse =
  property $ do
    xs <- forAll genIntList
    List.reverse (List.reverse xs) === xs

We generate a list of Int, where the list has anywhere from 0 to 100000 elements, and each Int lies somewhere between that type's minBound and maxBound as defined by the Enum class. So, INT_MIN and INT_MAX. We can run this property with the check function:

λ> check prop_reverse
  ✓ prop_reverse passed 100 tests.

Although it takes some practice, property testing is an effective and cheap way to gain confidence that your program works.

Shrinking for free

When a property fails, the generated value becomes a counterexample, and it must be communicated to the operator. Since it came from a random generator, though, it is likely to be a colossal and inane example. Printing it to standard output and handing it off to the operator immediately is going to ruin their life for the next few hours.

For this reason, most property-testing libraries will include some notion of shrinking, a deterministic search that reduces the counterexample until the property stops failing. The user only sees the smallest counterexample.

In Hedgehog, a reasonably good1 shrink function is constructed for free alongside every generator. Shrink functions are coupled with every generator, and a lazy tree of all possible shrinks is constructed alongside each generated value. Shrinking is then little more than traversing the tree. This is Hedgehog's theoretical point of difference from QuickCheck, though it also differs in a few practical ways.

Shrinking constructively

Let's play to my strengths here and write a piece of bad code, to demonstrate how genIntList shrinks. My bad code will be a subtly broken implementation of reverse. We can test it against an oracle function that we know to be correct, List.reverse.

-- Drops an element somewhere around the middle of the list.
fauxReverse :: [a] -> [a]
fauxReverse xs =
  let sx = List.reverse xs
      mp = length xs `div` 2
      (as, bs) = List.splitAt mp sx
  in as <> List.drop 1 bs

prop_fauxReverse :: Property
prop_fauxReverse =
  property $ do
    xs <- forAll genIntList
    fauxReverse xs === List.reverse xs

Recall that genIntList has a linear Range from 0 to 100000 for its length, and the elements are generated using enumBounded, which uses a Range from minBound to maxBound. When this property inevitably fails, we'd expect to see this information used to inform the counterexample search, and indeed we do:

λ> check prop_fauxReverse
  ✗ <interactive> failed after 2 tests and 11 shrinks.

       ┏━━ Hedge.hs ━━━
    30 prop_fauxReverse :: Property
    31 ┃ prop_fauxReverse =
    32 ┃   property $ do
    33 ┃     xs <- forAll genIntList
       ┃     │ [ -9223372036854775808 ]
    34 ┃     fauxReverse xs === List.reverse xs
       ┃     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       ┃     │ Failed (- lhs =/= + rhs)
       ┃     │ - []
       ┃     │ + [ -9223372036854775808 ]

    This failure can be reproduced by running:
    > recheck (Size 1) (Seed 6043815965182080260 5082204861531945465) <property>

λ> minBound :: Int

Generators made out of Hedgehog.Range are constructive; Range contains enough information to generate a correct value every time, and each Range implies a shrink step and gradient. Ranges are good!

We might like to write a more precise property to protect against dropped elements. A reverse function should preserve every element in the list!

λ> check prop_fauxReverse_length
  ✗ <interactive> failed after 2 tests and 1 shrink.

       ┏━━ Hedge.hs ━━━
    36 prop_fauxReverse_length :: Property
    37 ┃ prop_fauxReverse_length =
    38 ┃   property $ do
    39 ┃     xs <- forAll genIntList
       ┃     │ [ -9223372036854775808 ]
    40 ┃     length (fauxReverse xs) === length xs
       ┃     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       ┃     │ Failed (- lhs =/= + rhs)
       ┃     │ - 0
       ┃     │ + 1

    This failure can be reproduced by running:
    > recheck (Size 1) (Seed 4416742318514417762 31116095073768267) <property>


Predicates and failure

While Range helps us construct generators in terms of bounds and steps, we may also wish to write generators in terms of negative invariants. It may be simpler to generate terms, apply some predicates to them, and discard the invalid ones. This gives us the flexibility to write much more complicated generators, though they may take longer to run, or diverge.

Generators that discard heavily are certain to be less productive than other constructive generators. They can generate an arbitrary number of invalid values for each valid one, at the expense of your CPU cycles and precious seconds you probably needed for your life. So, this is a tool to be used gently and infrequently.

In practice, many invariants can be expressed constructively in terms of ranges and bind. Regardless, negative generators that discard are necessary when a constructive generator is not feasible.

In QuickCheck (and Jack) we'd use the suchThat combinator for this.

Hedgehog's Gen implements Alternative and MonadPlus, and the user is free to use empty / mzero / (<|>) / asum to indicate failure. There are also a few helpful named wrappers and functions over these instances, like Gen.filter (which is similar to suchThat) and discard. Discarding is a first-class notion in Hedgehog, and invariants enforced this way are also enforced during shrinking.

I use discard all the time, usually during large choice / asum blocks. A good example is in Hedgehog's STLC example ; if no values of a certain type are currently bound in scope, the known-value generator fails, and the choice block higher up will generate a literal instead.

Here's a quite vacuous example: a generator for sized lists of unique elements. To provide a better demonstration, I'll use some fairly constraining invariants: elements must be valued between 0 and 10, and the list's length must be between 0 and 10:

genUniqueIntList :: Monad m => Gen m [Int]
genUniqueIntList = do
  let listLength = Range.linear 0 10
      genInt = (Range.linear 0 10)
  xs <- Gen.list listLength genInt
  if List.nub xs == xs
    then pure xs
    else Gen.discard -- empty / mzero

Using Gen.sample to inspect some random values from this generator, we see that it works:

λ> Gen.sample genUniqueIntList
λ> Gen.sample genUniqueIntList
λ> Gen.sample genUniqueIntList
λ> Gen.sample genUniqueIntList

If you were to run the generator just once and inspect its shrinks using Gen.print, you would observe that this generator fails nearly all the time. We get slightly better results when rewritten with Gen.filter, which scales the ranges behind the scenes:

genUniqueIntList :: Monad m => Gen m [Int]
genUniqueIntList = do
  let listLength = Range.linear 0 10
      genInt = (Range.linear 0 10)
  Gen.filter (\xs -> List.nub xs == xs) (Gen.list listLength genInt)
λ> Gen.print genUniqueIntList
=== Outcome ===
=== Shrinks ===

Hedgehog.Gen contains useful combinators for various containers, including maps and sets. These are pretty handy, and much more convenient than rolling it yourself:

λ> Gen.sample (Gen.set (Range.linear 0 10) ( (Range.linear 0 10)))
[ fromList []
, fromList [ 3 ]
, fromList [ 3 ]
, fromList [ 0 , 3 ]
, fromList [ 0 , 2 ]
, fromList [ 0 , 2 ]
, fromList [ 3 ]
, fromList [ 0 ]
, fromList []
, fromList []

In short, discarding in Hedgehog is easy, it still leads to meaningful and correct shrinking, and there are some helper functions in Hedgehog.Gen you should consult with before doing it yourself. As with QuickCheck, keep an eye on the values coming out of your generator, as discarding can negatively impact coverage.

Shrinking in QuickCheck

Note that shrinking exists in QuickCheck, and is reasonably good. Every Arbitrary instance carries a shrink :: a -> [a] function. There's also forAllShrink if you wish to avoid Arbitrary and all its orphan works. All the Arbitrary instances in popular use come with shrinking functions. However, since shrinking is not fundamental to Gen, we can't compose generators without also manually constructing a shrink function. If the generator has few invariants, we may get away with genericShrink, but in practice engineers often opt to forego shrinking altogether when working with complicated generators.

Hedgehog supports custom shrinking in the QuickCheck style via Hedgehog.Gen.shrink.

A new foundation

Hedgehog's predecessor, Jack, was built as a QuickCheck utility library. It reused QuickCheck's runner and notion of Property, and leaned on the wealth of generators out there on Hackage.

QuickCheck is beautiful, long-lived, and well-maintained, and I've used it a lot to great benefit. It's just a relatively old codebase, with lots of subtle niggles.

Hedgehog does not depend on QuickCheck. Starting from scratch with a new dependency base and no backwards-compatibility requirements allows a few fun improvements:


Hedgehog allows first-class effectful generators and effectful tests. The user must track their effects in the type, as you would in ordinary Haskell. There's no unsafePerformIO under the hood.

Effectful generators

The type of generators, Gen, forms a monad transformer. It has instances for all the classes you'd expect, like MonadIO and MonadTrans. It also has MTL-style instances, MonadState and the like. When those are insufficient, it implements MFunctor, so you can hoist your way to victory.

Here's a pretty silly IO generator:

import           Data.Text (Text)
import qualified Data.Text as T
import qualified Data.Text.IO as T

import           Hedgehog
import qualified Hedgehog.Gen as Gen
import qualified Hedgehog.Range as Range

-- Generate a dictionary word by sampling from words.
-- Don't actually do this.
genWord :: Gen IO Text
genWord = do
  ws <- T.lines <$> liftIO (T.readFile "/usr/share/dict/words")
  Gen.element ws
λ> Gen.sample genWord

It will be interesting to see what kind of effectful generators people find useful. I've mostly been using this transformer redundantly, managing environments with ReaderT to clean up complicated code. I can't think of a use for an IO generator, but nor can I think of a reason they should be prohibited.

Effectful tests

The type of tests, Test, also forms a monad transformer with the similar instances to Gen.

To write an IO test, we simply use liftIO to our heart's content:

prop_got_words :: Property
prop_got_words =
  property $ do
    b <- liftIO (doesFileExist "/usr/share/dict/words")
    assert b

To test functions in Either or ExceptT, liftEither and liftExceptT functions are provided. These use pretty-show on the error, and use GHC call stacks to isolate the line of code responsible for the failure:

prop_got_words_except :: Property
prop_got_words_except =
  property $
    liftExceptT gotWords

gotWords :: MonadIO m => ExceptT Text m ()
gotWords = do
  b <- liftIO (doesFileExist "/usr/share/dict/words")
  if b then pure () else throwError "don't got words"

The keen observer will notice that property :: Test IO () -> Property requires the user to finagle their monad stack into IO before it can be run via mmorph. This is one place the interface could be improved! Hedgehog is just getting started.

Parallel testing

In addition to checkSequential, which behaves much like quickCheckAll, Hedgehog supplies checkConcurrent. It uses concurrent-output to wrangle the terminal.

testsSeq :: IO Bool
testsSeq =
  checkSequential $$(discover)

testsCon :: IO Bool
testsCon =
  checkConcurrent $$(discover)

Note that concurrently checking IO properties may lead to nondeterministic failure. The user must identify and sequentialise tests that cannot be run concurrently.

The API for this is in slight flux at the time of writing, so please check out the examples and documentation for more.

Reporting counterexamples

Printing the seed

There is nothing worse than recreating a complex counterexample from stdout. Lots of libraries (including time) have Show instances that perform pretty-printing, rather than the derived version, and these can't be easily pasted into GHCi. If you're bad at writing manual shrink functions, a QuickCheck counterexample will often be huge, perhaps larger than your clipboard buffer! This is a bad place to be.

When a property fails, Hedgehog will print the seed used by the random number generator. This seed is portable across platforms, and can be pasted directly into GHCi:

This failure can be reproduced by running:
> recheck (Size 1) (Seed 861472791889809000 7503307783496352651) prop_fauxReverse_length

Source locations with GHC CallStack

Hedgehog uses the new call stack API in GHC 8 to great effect when reporting property failures. Most combinators carry a HasCallStack constraint. This enables the inline display of generated values:

  ✗ prop_fauxReverse_length failed after 2 tests and 19 shrinks.

       ┏━━ Hedge.hs ━━━
    48 prop_fauxReverse_length :: Property
    49 ┃ prop_fauxReverse_length =
    50 ┃   property $ do
    51 ┃     xs <- forAll genIntList
       ┃     │ [ -9223372036854775808
       ┃     │ , -9223372036854775808
       ┃     │ , -9223372036854775808
       ┃     │ , -9223372036854775808
       ┃     │ , -9223372036854775808
       ┃     │ , -9223372036854775808
       ┃     │ , -9223372036854775808
       ┃     │ , -9223372036854775808
       ┃     │ , -9223372036854775808
       ┃     │ , -9223372036854775808
       ┃     │ ]
    52 ┃     length (fauxReverse xs) === length xs
       ┃     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       ┃     │ Failed (- lhs =/= + rhs)
       ┃     │ - 9
       ┃     │ + 10

    This failure can be reproduced by running:
    > recheck (Size 1) (Seed 861472791889809000 7503307783496352651) prop_fauxReverse_length

Pretty-printing counterexamples

Hedgehog depends on pretty-show, and uses it to print counterexamples where possible. The Show constraints on various combinators like (===) and liftExceptT are used in this way. You may still get a wall of text if your counterexample doesn't reduce well, but it'll be a heck of a lot neater.

This is a simple trick we can extend to regular ol' QuickCheck, by simply reimplementing (===). You should absolutely do this. Life changing.

No typeclasses

This is a minor aesthetic difference, but Hedgehog has no equivalent to the Arbitrary typeclass. Since test frameworks typically live outside of a project's dependency tree, and are never in base, reliance on a typeclass to provide test coverage creates a slightly awkward situation. Users have to choose between orphan instances and spurious dependencies.

Arbitrary is a little bit questionable anyway, since it is a lawless class used mostly for overloading. Living without it is fairly easy; we just give all our generators names, and summon them by name using forAll. It does come at a price: we lose the pretty prop_abc x y z = ... shorthand.

  1. As noted on Reddit by Koen Claessen and previously on the QuickCheck issue tracker, the integrated approach is not perfect. In a monadic generator, terms depend on one another, and thus need to be shrunk in their bound order. This is something like a depth-first search of the counterexample space, halting at a local minima. Hedgehog provides invariant-preserving shrinks for free, but does not necessarily produce better shrinks than QuickCheck. Adapting Hedgehog's integrated shrinking for alternative search procedures is something of an open question.