The round-trip property

Published: 2017-06-07
Words: 2288

... a hastily-written post about the power-to-weight champion of property testing.

Parse farce

Computers need to talk to one another, which means programmers need to agree on things. This is the worst, for various reasons, but mostly because it leaves us sitting around implementing parsers and printers for formats we barely care about. A good portion of my career thus far has been spent on parsers and printers!

Parsers and printers are a classic missed isomorphism: they're supposed to be dual, but in general we cannot derive both from the same specification.1 If you want some assurance that your implementation is correct, expect to be uncomfortably reliant on testing.

Unit tests are pretty much unavoidable here, and are especially handy when there's a simple specification or pre-existing test suite to compare against. However, the bugs encountered while writing a parser are generally not easy to pre-empt! Unit tests cannot bear the entire load. You'd end up shipping bugs or wasting time.


We know that parsers and printers are supposed to be dual. We can simply treat this as a law, and write a property test to enforce it.

Using Hedgehog primitives, it might look a little like this:

prop_trip = do
  x <- forAll genAst
  parse (pretty x) === pure x

... although a wise Hedgehog programmer would probably use the supplied tripping function.

While the round-trip test is pretty well-known among functional programmers and property-based testing fans2, I really want to hammer home three things:

  1. It is unreasonably effective
  2. It is fairly easy3 to implement
  3. It should be in your testing toolkit

This is both the simplest useful property-test I can think of for a working engineer and the most likely to reliably identify bugs of consequence. It is property testing's trojan horse. Talk to your team lead about property testing today!

A worked example

Let's mock up a parser and pretty-printer, break them subtly, and use a generator to identify incompatibilities. This is especially easy when using a property-testing library like Hedgehog. If you're stuck in a language without a decent framework, this is a simple enough property to implement by hand using random primitives4.


Our pal today is a tiny subset of a C-like expression language. I'll only be implementing a scanner, since it should be sufficient to get the point across. Lexers / scanners are the simplest form of parser, and also the easiest to test. I'll befriend two birds with the one loaf here by also demonstrating a little bit of Parser Combinator Hell5.

Here's a valid program (don't worry about how it executes):

x := 41; x += 2;
  x + 0.00001; abc := x; x++;

Here are examples of our tokens, in lieu of a real specification:

55555       // ints
1.222222    // floats, up to six points
:=          // assignment
+           // addition
+=          // plus equals
abc         // variables, [a-zA-Z]+
(           // left paren
)           // right paren
++          // increment
;           // separator

... and here's a corresponding datatype:

data Token =
    TInt Int
  | TFloat Double
  | TAssign
  | TPlus
  | TPlusEq
  | TVar Text
  | TLParen
  | TRParen
  | TIncrement
  | TSeparator
  deriving (Eq, Ord, Show)

We don't care about any kind of whitespace. It can be peeled off by the lexer as we go, and can be inserted freely by the pretty-printer.

From this point on, code may be intentionally incorrect. Please don't cargo-cult anything without reading the full post.


Personally, I prefer to write the printer first, so I can get the property test in place before I even get started on the parser. This is a preference I developed over time by doing the precise opposite. Your mileage may vary.

A (very bad) pretty-printer for a stream of tokens, given the above caveats, is pretty straightforward. We just write a printer for an individual token...

renderToken :: Token -> Text
renderToken t =
  case t of
    TInt x ->
      T.pack (show x)
    TFloat d ->
      T.pack (printf "%6f" d)
    TAssign ->
    TPlus ->
    TPlusEq ->
    TVar x ->
    TLParen ->
    TRParen ->
    TIncrement ->
    TSeparator ->

... and then add liberal amounts of whitespace. For this toy language, whitespace only really matters for disambiguation between ++ and +, x y and xy. Let's be lazy and apply it between each token. This is a hack job, after all!6

pretty :: [Token] -> Text
pretty =
  T.intercalate " " . fmap renderToken
λ> pretty [TLParen, TVar "x", TIncrement, TRParen, TSeparator]
"( x ++ ) ;"

Generating tokens

Here's a first attempt at a Hedgehog generator for tokens. Recall we had some very convenient restrictions on float precision and variable names. We'll take advantage of Hedgehog to enforce these.

genTokens :: Monad m => Gen m [Token]
genTokens =
  Gen.list (Range.linear 0 10000) genToken

genToken :: Monad m => Gen m Token
genToken =
  Gen.choice [
      TInt <$> (Range.linear 0 maxBound)
    , TFloat <$> Gen.double (Range.linearFrac 0.0 9223372036854775807.9)
    , pure TAssign
    , pure TPlus
    , pure TPlusEq
    , TVar <$> Gen.text (Range.linear 0 20) Gen.alpha
    , pure TLParen
    , pure TRParen
    , pure TIncrement
    , pure TSeparator

Writing the property

To get the property in place, we'll need stubs for parse:

parse :: Text -> Either ParseError [Token]
parse =

Now we can write our property:

prop_trip :: Property
prop_trip =
  property $ do
    toks <- forAll genTokens
    tripping toks pretty parse

tests :: IO Bool
tests =
  check prop_trip

... and now we can type tests into the repl to see how we're doing.


I'll write the parser in a naive-but-reasonable7 way and write about each bug the property identifies. I'm using Parsec for this, since it makes it relatively easy to make mistakes.

Let's start by peeling off whitespace...

whitespace :: Parser ()
whitespace =
  void (many

... stubbing out a token parser...

parseToken :: Parser Token
parseToken =
  P.choice [

... and writing our top-level parsing functions, ensuring irrelevant whitespace is handled and we parse all the way to the end of the stream:

parse :: Text -> Either ParseError [Token]
parse =
  -- You almost always want to parse until EOF when using Parsec!
  P.parse (parseTokens <* P.eof) ""

parseTokens :: Parser [Token]
parseTokens = do
  P.many (parseToken <* whitespace)

Most of our tokens are pretty boring string literals, so let's get those out of the way.

parseToken :: Parser Token
parseToken =
  P.choice [
      P.string ":=" *> pure TAssign
    , P.string "+" *> pure TPlus
    , P.string "+=" *> pure TPlusEq
    , P.string "(" *> pure TLParen
    , P.string ")" *> pure TRParen
    , P.string "++" *> pure TIncrement
    , P.string ";" *> pure TSeparator

Parsec doesn't give us numeric parsing primitives, so I'll write some slow ones of my own:

parseInt :: Parser Int
parseInt = do
  xyz <- P.many1 P.digit
  case readMaybe xyz of
    Just x -> pure x
    Nothing -> empty

parseDouble :: Parser Double
parseDouble = do
  a <- P.many1 P.digit
  _ <- P.char '.'
  b <- P.many1 P.digit
  guard (length b <= 6)
  case readMaybe (a ++ "." ++ b) of
    Just x -> pure x
    Nothing -> empty

... and I'll need to recognise our alphanumeric variables, too:

parseVar :: Parser Text
parseVar =
  fmap T.pack (P.many1 P.letter)

... so, stitching it all together, it looks like something like this should work:

parseToken :: Parser Token
parseToken =
  P.choice [
      TInt <$> parseInt
    , TFloat <$> parseDouble
    , TVar <$> parseVar
    , P.string ":=" *> pure TAssign
    , P.string "+" *> pure TPlus
    , P.string "+=" *> pure TPlusEq
    , P.string "(" *> pure TLParen
    , P.string ")" *> pure TRParen
    , P.string "++" *> pure TIncrement
    , P.string ";" *> pure TSeparator

... but does it? My first unit test passes. Looks like it!

> parse "xyz := 1 + 1;"
Right [TVar "xyz",TAssign,TInt 1,TPlus,TInt 1,TSeparator]

Looks like we're done! If the property passes, we can go home.

> tests
  ✗ <interactive> failed after 2 tests and 6 shrinks.

        ┏━━ RTT.hs ━━━
    152 prop_trip :: Property
    153 ┃ prop_trip =
    154 ┃   property $ do
    155 ┃     toks <- forAll genTokens
        ┃     │ [ TFloat 0.0 ]
    156 ┃     tripping toks pretty parse
        ┃     ^^^^^^^^^^^^^^^^^^^^^^^^^^
        ┃     │ ━━━ Original ━━━
        ┃     │ Right [ TFloat 0.0 ]
        ┃     │ ━━━ Roundtrip ━━━
        ┃     │ Left (line 1, column 2):
        ┃     │ unexpected '.'
        ┃     │ expecting digit, letter, ":=", "+", "+=", "(", ")", "++", ";" or end of input

    This failure can be reproduced by running:
    > recheck (Size 1) (Seed 6054864962880355960 1567128570873129121) <property>

Weird failures on floats. That's weird, parseDouble worked fine when I tested it in isolation! Turns out the order inside a choice block really matters in Parsec. When rules are ambiguous, the most general one needs to be at the bottom, else it will always succeed in place of any more specific combinators. Flipping the order of the TInt and TFloat parsers is not enough, however! Parsec also allows failing generators to consume input, so we need to use try to stop parseDouble eating integers. Here's how we get away with it:

  P.choice [
      TFloat <$> P.try parseDouble
    , TInt <$> parseInt

The next counterexample is [TPlusEq], which complains about an unexpected =. This looks like the same thing, the first + is picked up by TPlus and the = is meaningless. So, let's put Increment and PlusEq above Plus, and wrap them both in try:

parseToken :: Parser Token
parseToken =
  P.choice [
      TFloat <$> P.try parseDouble
    , TInt <$> parseInt
    , TVar <$> parseVar
    , P.try (P.string "++") *> pure TIncrement
    , P.try (P.string "+=") *> pure TPlusEq
    , (P.string "+") *> pure TPlus
    , (P.string ":=") *> pure TAssign
    , P.string "(" *> pure TLParen
    , P.string ")" *> pure TRParen
    , P.string ";" *> pure TSeparator

The next counterexample is [TVar ""], which really shouldn't be happening. This is a generator bug, and not an intentional one. Whoops. Adjust the lower range parameter: TVar <$> Gen.text (Range.linear 1 20) Gen.alpha.

Here's our final counterexample:

Failed (- Original / + Roundtrip)
  │ - Right [ TFloat 2.2858544151012983e-283 ]
  │ + Right [ TFloat 0.0 ]

Looks like we're generating some floats that require more precision than six decimal places to render. They're being truncated during serialisation. This is another generator invariant. We can fix this by ensuring our generated values are truncated up front:

genToken :: Monad m => Gen m Token
genToken =
  Gen.choice [
      (TFloat . round6) <$> Gen.double (Range.linearFrac 0.0 9223372036854775807.9)
    , ...

round6 :: Double -> Double
round6 x =
  (fromInteger $ round (x * (10^6))) / (10.0^^6)

That's it, the property now passes!

<interactive> passed 100 tests.

A straw man

I'm now reasonably confident that this parser works. This is a really simple language, so I could probably have done that with unit tests this time around, but it probably would have taken a similar amount of effort.

This was the simplest, least interesting serialisation problem I could think of. It intentionally has no recursive productions or context-sensitivity. I added several restrictions to make it easy to parse. It still picked up a couple of places where I was holding Parsec wrong, and a couple of spots where my generator, parser, and printer were implementing different languages.

Round-tripping is useful for pretty much any kind of parser-printer combo, from JSON or binary serialisation up to ridiculous layout-sensitive Haskell parsers. It pays off pretty much immediately, even for inane cases like this. It is hard to live without when implementing complicated formats. If you're solving this kind of problem, I implore you to give it a try.

All up

While you may still need other tests to ensure you haven't implemented some wildly incorrect language, round-trip testing can ensure your parser is complete and unambiguous. Care sometimes needs to be taken in writing a good printer and a good generator. This feels a lot like writing a specification! It's a much more productive process than thrashing your brain for interesting permutations.

Parser ambiguities are an extremely common class of bug that can wreak the worst kind of havoc. With a small amount of up-front due diligence, we can eliminate them in a semi-automated fashion. This is almost certainly in your interest.

This technique can be reused for any serialisation layer, as well as any internal functions that should be involutive, including isos.

  1. There's quite a bit of research on this front, just relatively few production-ready tools or libraries. Unifying Parsing and Pretty-Printing by Rendel and Ostermann is one such paper. There are also several interesting-looking new libraries on Github that do precisely this in Haskell, though none appear to be used widely enough for me to recommend.

  2. I think I first saw this property in a Stack Overflow answer written by Ed Kmett. I had googled "property tests for parser" or something like that. Can't find it now, sadly.

  3. I'm sorry for using this word. I suspect a software developer with the usual level of hubris will be able to hack together a generator with or without a property-testing library. It may not be easy if you haven't dealt with randomness before, or haven't written a search procedure before, but you can muddle through, I promise.

  4. You'll need to reduce your counterexamples by hand. This may be unpleasant, though not nearly as unpleassant as shipping bugs!

  5. The astute reader may note that this program could have been implemented fairly quickly using regular expressions or a scanner generator. I agree, for the most part, though my lexers often start out this way and become abhorrent layout-sensitive beasts.

  6. You will probably want test coverage for multiple spaces and line breaks, in practice.

  7. I don't normally write code like this, please don't hold this against me in a hiring process!