State machine testing with Hedgehog

Published: 2017-07-15
Words: 2280

Jacob Stanley's Hedgehog library has grown some exciting new functionality: state machine testing. This stuff is going straight into my everyday testing toolkit, so y'all get to read a bit about it.

Property-based testing is typically introduced with simple functional examples, like reversing a list. It is a very profitable technique for testing pure code! It's less clear to the beginner how to generalise these techniques for a messy, effectful production application. It is easy to end up with outstanding test coverage for the edges of the system, but little assurance that the effectful core is correct.

Often the intermediate solution is to write glorified unit tests that use some generated inputs to perform a small, predetermined set of actions. This has a very slight advantage over regular old unit tests; for example, a randomised IO unit test is quite likely to identify the incorrect use of a third-party API, or a malformed database query. It's still a pretty heavy compromise.

Property-based testing promised a rich declarative style that looks more like the design than the implementation, and to find a lot more bugs than a unit test. How can we do better?

State machines

My complex effectful system designs don't often look like pseudocode -- they look like abstract state machines. If you're unfamiliar, state machines can be drawn up in this lovely whiteboard-friendly format:

The simplest state machine I could find, modelling a turnstile

The simplest state machine I could find, modelling a turnstile

We have:

The various states of the system have been encoded as labels, in this case Locked and Unlocked. The available actions in each state are encoded as arrows, in this case Coin and Push. Though both actions are available in both states, their effect on the machine's state depends entirely on the initial state when the action is performed.

This is some pretty straightforward logic. If the turnstile is locked, it can't be pushed. If we insert a coin, the turnstile is unlocked. If we push an unlocked turnstile, it locks.

It's fun to model this kind of system as a pure functional program:

data TurnstileState =
    Locked
  | Unlocked

coin :: TurnstileState -> TurnstileState
coin Locked = Unlocked
coin Unlocked = Unlocked

push :: TurnstileState -> TurnstileState
push Locked = Locked
push Unlocked = Locked

What a pleasant model! We probably wouldn't even bother writing tests for it.

Unfortunately, as you probably already know, this model is insufficiently enterprise. We need to implement a turnstile server to provide gated access to all the threads, and the entire organisation. It would be nice, after the fact, to test that our big weird effectful machine behaves like this simple function.

Example: Turnstile

The Hedgehog API in use here is still in flux, so this code may stop working in the near future, though it shouldn't break a whole lot.

First, let's build the program.

An effectful turnstile

Since we've got mutable state now, let's model it with an IORef:

import           Data.IORef

data TurnstileState =
    Locked
  | Unlocked
  deriving (Eq, Ord, Show)

newtype Turnstile = Turnstile {
    unTurnstile :: IORef TurnstileState
  }

newTurnstile :: IO Turnstile
newTurnstile =
  Turnstile <$> newIORef Locked

This extremely enterprise turnstile server needs to have atomic actions, so there's a fun and unreasonably effective hammer we can reach for: atomicModifyIORef'.

insertCoin :: Turnstile -> IO ()
insertCoin (Turnstile ref) =
  atomicModifyIORef' ref $ \_ ->
    (Unlocked, ())

pushTurnstile :: Turnstile -> IO Bool
pushTurnstile (Turnstile ref) =
  atomicModifyIORef' ref $ \s ->
    case s of
      Locked ->
        (Locked, False)
      Unlocked ->
        (Locked, True)

That's it! Ship it.

A model

Most of the power of this technique comes from modelling state and control flow in a way that is easy to reason about.

This model should be written in terms of externally-visible information. It should not contain any implementation details. When testing a web application, for example, you might store generated passwords directly in the model; the application itself would store hashes.

State

The state in this case is pretty straightforward, so this should look a little familiar. Ignore the kind signatures and parameters for now:

data ModelState (v :: * -> *) =
    TLocked
  | TUnlocked
  deriving (Eq, Ord, Show)

initialState :: ModelState v
initialState =
  TLocked
Transitions

Recall that we had two types of actions available, coin and push.

In Hedgehog, these are specified as discrete datatypes, parameterised by a functor. Again, ignore this detail for now.

Squint a bit, and observe that we're encoding two functions that take no arguments with two constructors that take no arguments:

data Coin (v :: * -> *) =
  Coin
  deriving (Eq, Show)

data Push (v :: * -> *) =
  Push
  deriving (Eq, Show)

The key function to be used is actions, which expects a list of Command. Commands are made up of:

I recommend looking at a worked example to see what's going on. The types are a little complicated, though for a good reason!

I'd suggest writing a command for each transition in the state machine, give or take. You might want additional commands to ensure bad inputs don't lead to bad states.

Let's write our simplest command, the one for Coin:

s_coin :: (Monad n, MonadIO m, MonadTest m) => Turnstile -> Command n m ModelState
s_coin ts =
  let
    -- We can always insert a coin, so we don't need to see the state.
    gen _state =
      Just $
        pure Coin
    -- We execute this action by calling 'insertCoin'.
    execute Coin =
      liftIO (insertCoin ts)
  in
    Command gen execute [
        -- After a coin action, the turnstile should be unlocked.
        -- First we update our model:
        Update $ \s Coin _o ->
          TUnlocked
        -- ... then we enforce a very simple predicate on it:
      , Ensure $ \_before after Coin () -> do
          after === TUnlocked
      ]

Since this generator always produces Coin without looking at the state, both our coin transitions are covered.

For push, outcomes are very different depending on the initial state. So, let's break them out into two different commands.

When the turnstile is locked, pushing should not change the state, and should not allow anyone in:

s_push_locked :: (Monad n, MonadIO m, MonadTest m) => Turnstile -> Command n m ModelState
s_push_locked ts =
  let
    -- This generator only succeeds when the gate is thought to be locked.
    gen state =
      case state of
        TLocked ->
          Just $
            pure Push
        TUnlocked ->
          Nothing
    execute Push = do
      liftIO $ pushTurnstile ts
  in
    Command gen execute [
        -- Precondition: the gate is thought to be locked.
        Require $ \s Push ->
          s == TLocked
        -- Update: pushing the locked gate has no effect.
      , Update $ \s Push _o ->
          TLocked
        -- Postcondition: we're denied admission, the turnstile gate stays locked.
      , Ensure $ \before after Push b -> do
          before === TLocked
          assert (not b)
          after === TLocked
      ]

When the turnstile is unlocked, pushing should change the state to TLocked, and we should get a result of True:

s_push_unlocked :: (Monad n, MonadIO m, MonadTest m) => Turnstile -> Command n m ModelState
s_push_unlocked ts =
  let
    gen state =
      case state of
        TUnlocked ->
          Just $
            pure Push
        TLocked ->
          Nothing
    execute Push = do
      liftIO $ pushTurnstile ts
  in
    Command gen execute [
        -- Precondition: the gate is thought to be unlocked.
        Require $ \s Push ->
          s == TUnlocked
        -- Update: pushing the unlocked gate locks it.
      , Update $ \s Push _o ->
          TLocked
        -- Postcondition: we gain admission, the turnstile gate is locked.
      , Ensure $ \before after Push b -> do
          before === TUnlocked
          assert b
          after === TLocked
      ]

That's it, we've encoded the whole state machine. Let's run it!

The property

Once the machine has been specified in this way, our work is done. It's over to Hedgehog.

First, we create a turnstile. We then use the actions combinator to generate an arbitrary list of actions. The executeSequential function takes an initial model state, and applies that list of actions in order, ensuring preconditions and postconditions are satisfied.

prop_turnstile :: Property
prop_turnstile =
  property $ do
    turnstile <- liftIO newTurnstile
    actions <- forAll $
      Gen.actions (Range.linear 1 100) initialState [
          s_coin turnstile
        , s_push_locked turnstile
        , s_push_unlocked turnstile
        ]
    executeSequential initialState actions
*Test.Example.Turnstile> check prop_turnstile
  ✓ <interactive> passed 100 tests.
True

Unfortunately, this test passed first try! The machine was too simple. Let's break it in an inane way to see the counterexample. I suggest flipping the model's initialState over to TUnlocked:

*Test.Example.Turnstile> check prop_turnstile
  ✗ <interactive> failed after 4 tests and 1 shrink.

        ┏━━ test/Test/Example/Turnstile.hs ━━━
    130 s_push_unlocked :: (Monad n, MonadIO m, MonadTest m) => Turnstile -> Command n m ModelState
          <snip>
    150-- Postcondition: we gain admission, the turnstile gate is locked.
    151 ┃       , Ensure $ \before after Push b -> do
    152 ┃           before === TUnlocked
    153 ┃           assert b
        ┃           ^^^^^^^^

        ┏━━ test/Test/Example/Turnstile.hs ━━━
    159 prop_turnstile :: Property
    160 ┃ prop_turnstile =
    161 ┃   property $ do
    162 ┃     turnstile <- liftIO newTurnstile
    163 ┃     actions <- forAll $
    164 ┃       Gen.actions (Range.linear 1 100) initialState [
    165 ┃           s_coin turnstile
    166 ┃         , s_push_locked turnstile
    167 ┃         , s_push_unlocked turnstile
    168 ┃         ]
        ┃         │ [ Var 0 :<- Push ]
    169 ┃     executeSequential initialState actions

    This failure can be reproduced by running:
    > recheck (Size 3) (Seed 8794484450743003252 4983793967879311195) <property>

False

The first push failed. We thought it was unlocked and tried to push, but it was not. Our model did not line up with reality, and Hedgehog demonstrated this with the smallest counterexample.

Perhaps you haven't yet noticed, but this turnstile service isn't the most useful system in the world. There are way more useful concurrency control abstractions in Haskell's standard library! It's just a stand-in effectful program.

Identifying race conditions

Thus far, we've been testing sequentially. We generate a list of actions, and run them one by one.

In practice, most effectful systems are concurrent. I commonly encounter threadsafe local abstractions, as well as external services, like a web application. Even this turnstile has atomicity guarantees that should be tested.

Since we're already generating arbitrary lists of actions, it shouldn't be much of a stretch to run arbitrary subsets of them in parallel, shrinking to find an unsafe interleaving. Indeed, this is the basis of Hedgehog's (as-yet unreleased) parallel state machine facilities!

These are easy to use: we just replace executeSequential with executeParallel in our property. If the actions being executed are not atomic, subsequences being run in parallel should hone in on the mismatch, and should shrink down to find the smallest such interleaving.

Parameterised actions

You may have noticed that our model and actions were parameterised by some higher-kinded type, (v :: * -> *). Likewise, I promised the complicated types in Command would have some use.

We didn't need it for our simple turnstile, but actions in an effectful system may produce return values that inform future interaction; think things like session IDs, user IDs, or timestamps. These are variable in the eyes of our state machine.

We need to reason about these values symbolically when generating sequences of actions; for example, you might want to say that the new UserId goes into a hash table, though we haven't performed the IO yet.

Once the action has been performed, we have the concrete result, and we can reason about it in more detail. In Ensure postconditions, we have access to the concrete value, and we can use it for assertions.

Hedgehog makes this distinction by parameterising things with a functor, which can be either Symbolic or Concrete. Ephemeral values should be represented in the model as Var UserId v.

Example: Tickets

Suppose we are interacting with an opaque ticketing system, where each ticket has some serial number. We don't care what the serial is, only that it's unique.

Our model might look something like this:

data ModelState (v :: * -> *) =
  ModelState {
      modelShows :: Set ShowName
    , modelTickets :: Map (Var TicketId v) ShowName
    }

Our UseTicket command might look something like this:

data UseTicket (v :: * -> *) =
  UseTicket (Var TicketId v)

... and our IssueTicket command will look something like this:

...
  in
    Command gen execute [
        Require $ \s (IssueTicket show) ->
          S.member show (modelShows s)
      , Update $ \s (IssueTicket show) tid ->
          s { modelTickets = M.insert tid show (modelTickets s) }
      ]

Better examples

Jacob has ported a classic Erlang QuickCheck example to Hedgehog. You'll find it in the hedgehog-example repo. It uses all the facilities described above.

Tune in next time for a much more comprehensive example using WAI.

Try it out!

This is a really nice way to specify and test effectful systems. It's much more effective than example-testing with canned action sequences.

There's also a lot of potential for test reuse: anything implementing the same interface can use the same property. The in-memory prototype and the production database-backed version of the same service can share the same excellent test coverage.

I'm looking forward to consolidating lots of my glorified unit tests and half-baked properties into these simple machines!

These techniques are not new! Quviq et al have been doing this for a living for many moons, and quite profitably. If you've done serious testing this way, please get in touch, as I'd love to see more examples.

Source code

I've omitted several uninteresting implementation details, so the code presented in this post may not compile on its own. The unabridged source is available here.