Recently I had to implement exhaustivity checking for case statements in a simple functional language. While pattern exhaustivity is a really hard problem still studied by PhDs, they're usually dealing with a complicated language with fancy types, like GADTs. For simple languages with very basic sum types, pattern matching is supposed to be easy.

I had a *lot* of trouble getting this problem through my head. I found a lot of resources *describing* how one might implement a simple and inefficient exhaustivity check, but had trouble translating them into code. In particular, I read Conor McBride's excellent prose rendition over and over again, and skimmed the Lennart paper he mentioned. Eventually, when the work was actually coming due, I developed some practical intuition: just build a trie, and walk through it.

This post is a naive implementation of exhaustivity checking using a trie. The error messages it throws aren't very good, nor is it very fast, but I hope it can be a starting point for somebody stuck, like I was, staring at forum prose with no idea what to do.

This implementation assumes that constructors are globally unique. When we encounter a constructor pattern, we use the constructor to look up its type, and then we look up all possible constructors for that type. This is mostly for expedience, and works for Haskell or ML-esque languages. If you can't suffer this constraint, you could also build the trie after typechecking, so you can make use of annotated type information.

Nothing too interesting, just `base`

and `containers`

:

```
{-# LANGUAGE OverloadedStrings #-}
module Exhausted where
import Control.Applicative
import Control.Monad
import Data.Foldable
import qualified Data.List as L
import Data.Map.Strict (Map)
import qualified Data.Map.Strict as M
import Data.Maybe
import Data.Monoid
import qualified Data.Set as S
import Data.Text (Text)
```

Let's quickly define as much of a language as we need to proceed. This means we need types, a way to look up constructors, and patterns.

Types first. Here I've used `Type`

to represent variables and ground / primitive types (like `String`

), and a separate `Declaration`

type for sum types:

```
data Type =
TBool
| TInt
| TString
| TVar TypeName
deriving (Eq, Ord, Show)
newtype TypeName = TypeName {
unTypeName :: Text
} deriving (Eq, Ord, Show)
newtype Constructor = Constructor {
unConstructor :: Text
} deriving (Eq, Ord, Show)
data Declaration =
TSum [(Constructor, [Type])]
deriving (Eq, Ord, Show)
```

We roll all our declarations up into a set of maps, for lookup purposes. Note well the horrible map crime:

```
data Declarations = Declarations {
unDeclarations :: Map TypeName Declaration
, unConstructors :: Map Constructor TypeName
} deriving (Eq, Ord, Show)
declareType :: TypeName -> Declaration -> Declarations -> Declarations
declareType tn d@(TSum cts) (Declarations ds cs) =
Declarations
(M.insert tn d ds)
(M.fromList (fmap (\(c,_) -> (c, tn)) cts) <> cs)
lookupDecl :: TypeName -> Declarations -> Maybe Declaration
lookupDecl tn =
M.lookup tn . unDeclarations
lookupConstructor :: Constructor -> Declarations -> Maybe TypeName
lookupConstructor c =
M.lookup c . unConstructors
```

... and lastly, we need some notion of pattern. I won't bother with any syntax, as we don't need it to implement exhaustivity.

```
data Pattern =
PVar Text -- e.g. foo
| PCon Constructor [Pattern] -- Foo (Bar baz) (Quux (Wup x))
deriving (Eq, Ord, Show)
```

A prefix trie is a tree with arbitrary branching factor, similar to a rose tree. They're commonly used for things like autocomplete, where all possible suffixes can be computed for a given prefix.

We'll be representing a set of patterns as one big trie, where each tier represents a single variable that the user intends to destructure. This means we keep track of every pattern used for that variable in the one map. This is a little bit like performing transposition.

Our `Pattern`

type was recursive, so we will need a different flatter type to use as leaves in the tree:

```
-- | Individual pattern nodes to be used in our trie.
-- We don't really care about variable names.
data TrieKey =
KVar -- x
| KCon Constructor -- Foo
deriving (Eq, Ord, Show)
```

... and now we can build our datastructure. It's a lot like a rose tree, but each tier has multiple keys. I've used a map for this.

Pay close attention to the `Monoid`

instance, which recurses by zipping together the list of subtrees. Warning: since we're assuming values in the same position will have the same type, this will not work at all if the patterns are ill-typed.

```
newtype PatternTrie = PatternTrie {
unPatternTrie :: Map TrieKey [PatternTrie]
} deriving (Eq, Ord, Show)
instance Monoid PatternTrie where
mempty = PatternTrie mempty
mappend (PatternTrie a) (PatternTrie b) =
PatternTrie $
M.unionWith (L.zipWith (<>)) a b
```

We can construct a `PatternTrie`

from a set of patterns by first converting each pattern into its own very uninteresting trie, then folding from the left with `mappend`

:

```
toPatternTrie :: Pattern -> PatternTrie
toPatternTrie pat =
case pat of
PVar _ ->
PatternTrie (M.singleton KVar mempty)
PCon c ps ->
PatternTrie (M.singleton (KCon c) (fmap toPatternTrie ps))
buildPatternTrie :: [Pattern] -> PatternTrie
buildPatternTrie =
foldMap toPatternTrie
```

It's much easier to develop intuition for this trie structure by looking at a concrete example.

Let's translate this Haskell pattern match into our system, build the trie, and display it using `pretty-show`

:

```
data Billy =
Bob String Int
| Busey Billy
foo = case billy of
Bob x y
Busey (Busey z)
Busey (Bob x y)
```

First we must declare the datatype:

```
billy :: (TypeName, Declaration)
billy =
(TypeName "Billy", TSum [
(Constructor "Bob", [
TString
, TInt
])
, (Constructor "Busey", [
TVar (TypeName "Billy")
])
])
billyDecls :: Declarations
billyDecls =
let (bn, bd) = billy in
declareType bn bd (Declarations mempty mempty)
```

Second, we must translate the three patterns, which are exhaustive:

```
billyExhaustive :: [Pattern]
billyExhaustive = [
PCon (Constructor "Bob") [
PVar "x"
, PVar "y"
]
, PCon (Constructor "Busey") [
PCon (Constructor "Busey") [
PVar "z"
]
]
, PCon (Constructor "Busey") [
PCon (Constructor "Bob") [
PVar "x"
, PVar "y"
]
]
]
```

Now, we can have fun rendering it in GHCi:

```
λ> buildPatternTrie billyExhaustive
PatternTrie
{ unPatternTrie =
fromList
[ ( KCon Constructor { unConstructor = "Bob" }
, [ PatternTrie { unPatternTrie = fromList [ ( KVar , [] ) ] }
, PatternTrie { unPatternTrie = fromList [ ( KVar , [] ) ] }
]
)
, ( KCon Constructor { unConstructor = "Busey" }
, [ PatternTrie
{ unPatternTrie =
fromList
[ ( KCon Constructor { unConstructor = "Bob" }
, [ PatternTrie { unPatternTrie = fromList [ ( KVar , [] ) ] }
, PatternTrie { unPatternTrie = fromList [ ( KVar , [] ) ] }
]
)
, ( KCon Constructor { unConstructor = "Busey" }
, [ PatternTrie { unPatternTrie = fromList [ ( KVar , [] ) ] } ]
)
]
}
]
)
]
}
```

We can see that our outer `Billy`

has both `Bob`

and `Busey`

cases covered, and the one recursive case also has both cases covered. Every other field is covered by `KVar`

s. Done! We just need to turn this visual intuition into an algorithm.

This means walking through the trie, pulling up the relevant declaration at each tier, and performing set difference on the constructors. If there's a KVar in the tier, it's exhaustive.

That's how the trie works. Let's implement a coarse check as described above. This implementation isn't amazing, but it does the job, and we're in a hurry.

Sorry for the code dump; then again, this is my site, and code is the point of this post! At least it's lightly commented:

```
data Inexhaustive =
Inexhaustive [TrieKey]
deriving (Eq, Ord, Show)
exhaustiveTrie :: Declarations -> PatternTrie -> Either Inexhaustive ()
exhaustiveTrie decls (PatternTrie pt) =
if M.member KVar pt
then -- exhaustive!
pure ()
else -- need to check against declarations!
fromMaybe (Left (Inexhaustive [KVar])) $ do
let cons = M.keys pt
-- If there are no constructors, throw an error
repr <- case cons of (KCon x:_) -> pure x; _ -> empty
-- If we can't find the declaration, throw an error
typeName <- lookupConstructor repr decls
decl <- lookupDecl typeName decls
-- Perform a set difference on the definition against our pattern
let want =
case decl of
TSum cts ->
S.fromList (fmap (KCon . fst) cts)
have = S.fromList cons
diff = S.difference want have
pure $ if diff /= S.empty
then -- Throw an error when cases are missing
Left (Inexhaustive (toList diff))
else -- recurse into subtries!
for_ (M.toList pt) $ \(k, ts) ->
case k of
KVar ->
pure ()
KCon _ ->
traverse_ (exhaustiveTrie decls) ts
```

Let's take away one of the cases from our exhaustive example above:

```
foo = case billy of
Bob x y
Busey (Busey z)
```

GHC certainly doesn't like this. It throws a nice informative warning:

```
<interactive>:18:1: Warning:
Pattern match(es) are non-exhaustive
In a case alternative: Patterns not matched: Busey (Bob _ _)
```

... but we can now catch the mistake, and throw a moderately-informative error:

```
λ> exhaustiveTrie billyDecls (buildPatternTrie billyExhaustive )
Right ()
λ> exhaustiveTrie billyDecls (buildPatternTrie billyNonExhaustive )
Left (Inexhaustive [KCon (Constructor {unConstructor = "Bob"})])
```

Good enough for me!

The error messages are not great here. This is a consequence of my hasty recursive implementation. You could do a much better job by taking Conor McBride's advice and producing the ideal trie, traversing it alongside the user's trie, and deleting paths as you go. Walk through the result to produce all possible paths, and you've got all the missing patterns.

My implementation is probably not very fast, either. I'm told Lennart Augustsson's Compiling Pattern Matching is the place to look if you're interested in doing this quickly.