Pattern exhaustivity checking

Published: 2017-03-10
Words: 1646

Recently I had to implement exhaustivity checking for case statements in a simple functional language. While pattern exhaustivity is a really hard problem still studied by PhDs, they're usually dealing with a complicated language with fancy types, like GADTs. For simple languages with very basic sum types, pattern matching is supposed to be easy.

I had a lot of trouble getting this problem through my head. I found a lot of resources describing how one might implement a simple and inefficient exhaustivity check, but had trouble translating them into code. In particular, I read Conor McBride's excellent prose rendition over and over again, and skimmed the Lennart paper he mentioned. Eventually, when the work was actually coming due, I developed some practical intuition: just build a trie, and walk through it.

This post is a naive implementation of exhaustivity checking using a trie. The error messages it throws aren't very good, nor is it very fast, but I hope it can be a starting point for somebody stuck, like I was, staring at forum prose with no idea what to do.

Constraints

This implementation assumes that constructors are globally unique. When we encounter a constructor pattern, we use the constructor to look up its type, and then we look up all possible constructors for that type. This is mostly for expedience, and works for Haskell or ML-esque languages. If you can't suffer this constraint, you could also build the trie after typechecking, so you can make use of annotated type information.

Imports

Nothing too interesting, just base and containers:

{-# LANGUAGE OverloadedStrings #-}
module Exhausted where


import           Control.Applicative
import           Control.Monad
import           Data.Foldable
import qualified Data.List as L
import           Data.Map.Strict (Map)
import qualified Data.Map.Strict as M
import           Data.Maybe
import           Data.Monoid
import qualified Data.Set as S
import           Data.Text (Text)

Language

Let's quickly define as much of a language as we need to proceed. This means we need types, a way to look up constructors, and patterns.

Types first. Here I've used Type to represent variables and ground / primitive types (like String), and a separate Declaration type for sum types:

data Type =
    TBool
  | TInt
  | TString
  | TVar TypeName
  deriving (Eq, Ord, Show)

newtype TypeName = TypeName {
    unTypeName :: Text
  } deriving (Eq, Ord, Show)

newtype Constructor = Constructor {
    unConstructor :: Text
  } deriving (Eq, Ord, Show)

data Declaration =
    TSum [(Constructor, [Type])]
  deriving (Eq, Ord, Show)

We roll all our declarations up into a set of maps, for lookup purposes. Note well the horrible map crime:

data Declarations = Declarations {
    unDeclarations :: Map TypeName Declaration
  , unConstructors :: Map Constructor TypeName
  } deriving (Eq, Ord, Show)

declareType :: TypeName -> Declaration -> Declarations -> Declarations
declareType tn d@(TSum cts) (Declarations ds cs) =
  Declarations
    (M.insert tn d ds)
    (M.fromList (fmap (\(c,_) -> (c, tn)) cts) <> cs)

lookupDecl :: TypeName -> Declarations -> Maybe Declaration
lookupDecl tn =
  M.lookup tn . unDeclarations

lookupConstructor :: Constructor -> Declarations -> Maybe TypeName
lookupConstructor c =
  M.lookup c . unConstructors

... and lastly, we need some notion of pattern. I won't bother with any syntax, as we don't need it to implement exhaustivity.

data Pattern =
    PVar Text -- e.g. foo
  | PCon Constructor [Pattern] -- Foo (Bar baz) (Quux (Wup x))
  deriving (Eq, Ord, Show)

Rolling patterns into a trie

A prefix trie is a tree with arbitrary branching factor, similar to a rose tree. They're commonly used for things like autocomplete, where all possible suffixes can be computed for a given prefix.

We'll be representing a set of patterns as one big trie, where each tier represents a single variable that the user intends to destructure. This means we keep track of every pattern used for that variable in the one map. This is a little bit like performing transposition.

Our Pattern type was recursive, so we will need a different flatter type to use as leaves in the tree:

-- | Individual pattern nodes to be used in our trie.
-- We don't really care about variable names.
data TrieKey =
    KVar                 -- x
  | KCon Constructor     -- Foo
  deriving (Eq, Ord, Show)

... and now we can build our datastructure. It's a lot like a rose tree, but each tier has multiple keys. I've used a map for this.

Pay close attention to the Monoid instance, which recurses by zipping together the list of subtrees. Warning: since we're assuming values in the same position will have the same type, this will not work at all if the patterns are ill-typed.

newtype PatternTrie = PatternTrie {
    unPatternTrie :: Map TrieKey [PatternTrie]
  } deriving (Eq, Ord, Show)

instance Monoid PatternTrie where
  mempty = PatternTrie mempty
  mappend (PatternTrie a) (PatternTrie b) =
    PatternTrie $
      M.unionWith (L.zipWith (<>)) a b

We can construct a PatternTrie from a set of patterns by first converting each pattern into its own very uninteresting trie, then folding from the left with mappend:

toPatternTrie :: Pattern -> PatternTrie
toPatternTrie pat =
  case pat of
    PVar _ ->
      PatternTrie (M.singleton KVar mempty)
    PCon c ps ->
      PatternTrie (M.singleton (KCon c) (fmap toPatternTrie ps))

buildPatternTrie :: [Pattern] -> PatternTrie
buildPatternTrie =
  foldMap toPatternTrie

Concrete examples

It's much easier to develop intuition for this trie structure by looking at a concrete example.

Let's translate this Haskell pattern match into our system, build the trie, and display it using pretty-show:

data Billy =
    Bob String Int
  | Busey Billy

foo = case billy of
  Bob x y
  Busey (Busey z)
  Busey (Bob x y)

First we must declare the datatype:

billy :: (TypeName, Declaration)
billy =
  (TypeName "Billy", TSum [
       (Constructor "Bob", [
            TString
          , TInt
          ])
     , (Constructor "Busey", [
            TVar (TypeName "Billy")
          ])
     ])

billyDecls :: Declarations
billyDecls =
  let (bn, bd) = billy in
  declareType bn bd (Declarations mempty mempty)

Second, we must translate the three patterns, which are exhaustive:

billyExhaustive :: [Pattern]
billyExhaustive = [
    PCon (Constructor "Bob") [
        PVar "x"
      , PVar "y"
      ]
  , PCon (Constructor "Busey") [
        PCon (Constructor "Busey") [
            PVar "z"
          ]
      ]
  , PCon (Constructor "Busey") [
        PCon (Constructor "Bob") [
            PVar "x"
          , PVar "y"
          ]
    ]
  ]

Now, we can have fun rendering it in GHCi:

λ> buildPatternTrie billyExhaustive
PatternTrie
  { unPatternTrie =
      fromList
        [ ( KCon Constructor { unConstructor = "Bob" }
          , [ PatternTrie { unPatternTrie = fromList [ ( KVar , [] ) ] }
            , PatternTrie { unPatternTrie = fromList [ ( KVar , [] ) ] }
            ]
          )
        , ( KCon Constructor { unConstructor = "Busey" }
          , [ PatternTrie
                { unPatternTrie =
                    fromList
                      [ ( KCon Constructor { unConstructor = "Bob" }
                        , [ PatternTrie { unPatternTrie = fromList [ ( KVar , [] ) ] }
                          , PatternTrie { unPatternTrie = fromList [ ( KVar , [] ) ] }
                          ]
                        )
                      , ( KCon Constructor { unConstructor = "Busey" }
                        , [ PatternTrie { unPatternTrie = fromList [ ( KVar , [] ) ] } ]
                        )
                      ]
                }
            ]
          )
        ]
  }

We can see that our outer Billy has both Bob and Busey cases covered, and the one recursive case also has both cases covered. Every other field is covered by KVars. Done! We just need to turn this visual intuition into an algorithm.

This means walking through the trie, pulling up the relevant declaration at each tier, and performing set difference on the constructors. If there's a KVar in the tier, it's exhaustive.

Exhaustivity checking

That's how the trie works. Let's implement a coarse check as described above. This implementation isn't amazing, but it does the job, and we're in a hurry.

Sorry for the code dump; then again, this is my site, and code is the point of this post! At least it's lightly commented:

data Inexhaustive =
    Inexhaustive [TrieKey]
  deriving (Eq, Ord, Show)

exhaustiveTrie :: Declarations -> PatternTrie -> Either Inexhaustive ()
exhaustiveTrie decls (PatternTrie pt) =
  if M.member KVar pt
    then -- exhaustive!
      pure ()
    else -- need to check against declarations!
      fromMaybe (Left (Inexhaustive [KVar])) $ do
        let cons = M.keys pt
        -- If there are no constructors, throw an error
        repr <- case cons of (KCon x:_) -> pure x; _ -> empty
        -- If we can't find the declaration, throw an error
        typeName <- lookupConstructor repr decls
        decl <- lookupDecl typeName decls
        -- Perform a set difference on the definition against our pattern
        let want =
              case decl of
                TSum cts ->
                  S.fromList (fmap (KCon . fst) cts)
            have = S.fromList cons
            diff = S.difference want have
        pure $ if diff /= S.empty
          then -- Throw an error when cases are missing
             Left (Inexhaustive (toList diff))
           else -- recurse into subtries!
             for_ (M.toList pt) $ \(k, ts) ->
               case k of
                 KVar ->
                   pure ()
                 KCon _ ->
                   traverse_ (exhaustiveTrie decls) ts

Usage example

Let's take away one of the cases from our exhaustive example above:

foo = case billy of
  Bob x y
  Busey (Busey z)

GHC certainly doesn't like this. It throws a nice informative warning:

<interactive>:18:1: Warning:
    Pattern match(es) are non-exhaustive
    In a case alternative: Patterns not matched: Busey (Bob _ _)

... but we can now catch the mistake, and throw a moderately-informative error:

λ> exhaustiveTrie billyDecls (buildPatternTrie billyExhaustive )
Right ()
λ> exhaustiveTrie billyDecls (buildPatternTrie billyNonExhaustive )
Left (Inexhaustive [KCon (Constructor {unConstructor = "Bob"})])

Good enough for me!

Room for improvement

The error messages are not great here. This is a consequence of my hasty recursive implementation. You could do a much better job by taking Conor McBride's advice and producing the ideal trie, traversing it alongside the user's trie, and deleting paths as you go. Walk through the result to produce all possible paths, and you've got all the missing patterns.

My implementation is probably not very fast, either. I'm told Lennart Augustsson's Compiling Pattern Matching is the place to look if you're interested in doing this quickly.