Annotated pretty-printers

Published: 2017-04-05
Words: 1379

If you've written a pretty-printer before, you might be familiar with libraries like wl-pprint and pretty . The APIs and underlying algorithms may be different, but both provide the user with a Doc type and a few dozen combinators with which we construct a tree of layouts. That tree is then rendered into a string, using various layout-specific heuristics to produce a pleasing output.

This post is about the new cohort of annotated pretty-printing libraries. These extend the widely-used interfaces of Wadler-Leijen printers (like wl-pprint) and Hughes-Peyton Jones printers (like pretty) by allowing the user to attach arbitrary information to nodes in the layout tree, replacing Doc with a higher-kinded type, Doc a.

We can now smuggle information alongside our representation of program text, and decide how to turn it into strings later. That sounds like a small change, but it enables some really fun things. Essentially, it becomes much easier to reuse printers written for uninteresting plain-text for information-rich presentations.

This post will take a pretty-printer written in the WL style and extend it until it can be trivially used to produce rich HTML output. In particular, we'll add syntax tags and free variable markers to the Doc as annotations, allowing us to apply syntax highlighting and generate hyperlinks between definitions.

Language

Let's define a toy language for demonstration purposes.

This technique is useful for any format worth pretty-printing, including markup formats, DSLs, and regular programming languages.

This language is remarkably simple: we have expressions, numbers, lambda abstractions, variables, and declarations:

data Expr =
    ENum Int         -- e.g. 41
  | EVar String      --      foo
  | ELam String Expr --      \x -> x
  | EApp Expr Expr   --      id x
  deriving (Eq, Ord, Show)

data Decl =
    Decl String Expr -- e.g. foo = 43
  deriving (Eq, Ord, Show)

newtype Program = Program [Decl]

(I'm a little hesitant to use a lambda calculus to illustrate, since not everyone is familiar with it, and may confuse the uninteresting parts of this post for its content. Please get in touch if you read this and would like a more concrete example, I'll likely have time to extend this post at some point.)

Printer

Let's first define a completely unannotated pretty-printer, suitable for use in a round-trip property like forAll genProgram (\x -> parse (pretty x) == x). This is a common property used to obtain some confidence that a parser has been implemented correctly, or that a language has been soundly defined.

Note that this printer doesn't use many of the traditional pretty-printing bells and whistles. You can do a lot more with these libraries than I do; I largely stick to the hang combinator.

import           Text.PrettyPrint.Annotated.WL (Doc, (<+>), (<$$>))
import qualified Text.PrettyPrint.Annotated.WL as WL

pretty :: Program -> Text
pretty =
 WL.displayT . WL.renderPrettyDefault . ppProgram

-- | Pretty-print a program.
--
-- This implementation does not use any annotations.
ppProgram :: Program -> Doc a
ppProgram (Program decls) =
  WL.vcat (fmap ppDecl decls)

ppDecl :: Decl -> Doc a
ppDecl (Decl name expr) =
  WL.hang 2 $
       WL.text name
  <+>  WL.char '='
  <$$> ppExpr expr

ppExpr :: Expr -> Doc a
ppExpr expr =
  case expr of
    ENum x ->
      WL.text (show x)
    EVar x ->
      WL.text x
    ELam x f ->
          WL.char '\\'
      <>  WL.text x
      <+> WL.text "->"
      <+> ppExpr f
    EApp f g ->
      parens (parens (ppExpr f) <+> parens (ppExpr g))

parens :: Doc a -> Doc a
parens x =
  WL.char '(' <> x <> WL.char ')'

This buys us something like this:

foo =
  ((\x -> x) (45))
bar =
  ((\x -> foo) (53))

This example uses wl-pprint-annotated, which is roughly equivalent to David Christiansen's annotated-wl-pprint, but allows Doc a Semigroup instance. This lets me use <> and foldMap instead of battling the traditional pack of wl-pprint combinators. Besides that, it's a very similar library.

Annotation

Let's chuck something vaguely informative into that a field.

I'd like to tell the difference between bound and free variables, so that we can include anchor links to definitions. So, I'll be threading a Set around too.

import           Data.Set (Set)
import qualified Data.Set as S

data Annotation =
    Punctuation
  | Literal
  | FreeIdentifier String
  | BoundIdentifier String
  | Binding String
  | DefinitionSite String

-- | Pretty-print a program, with annotations.
ppProgram :: Program -> Doc Annotation
ppProgram (Program decls) =
  WL.vcat (fmap ppDecl decls)

ppDecl :: Decl -> Doc Annotation
ppDecl (Decl name expr) =
  WL.hang 2 $
       WL.annotate (DefinitionSite name) (WL.text name)
  <+>  punctuation '='
  <$$> ppExpr mempty expr

ppExpr :: Set String -> Expr -> Doc Annotation
ppExpr bound expr =
  case expr of
    ENum x ->
      WL.annotate Literal (WL.text (show x))
    EVar x ->
      if S.member x bound
        then WL.annotate (BoundIdentifier x) (WL.text x)
        else WL.annotate (FreeIdentifier x) (WL.text x)
    ELam x f ->
          punctuation '\\'
      <>  WL.annotate (Binding x) (WL.text x)
      <+> WL.annotate Punctuation (WL.text "->")
      <+> ppExpr (S.insert x bound) f
    EApp f g ->
      parens (parens (ppExpr bound f) <+> parens (ppExpr bound g))

parens :: Doc a -> Doc Annotation
parens x =
  punctuation '(' <> x <> punctuation ')'

punctuation :: Char -> Doc Annotation
punctuation =
  WL.annotate Punctuation . WL.char

As you can see, we haven't changed the program a whole lot! We've just added some fairly mundane semantic information, plus whatever was necessary to identify free variables. Our old pretty-printing function still does exactly the same thing, since it discards all the annotations. When we want to, though, we can interpret these annotations however we like.

Rendering interesting HTML

Now, to make use of the annotated information.

Most of these libraries allow us to interpret the annotations (and indeed the Doc itself) into some arbitrary Monoid. We can interpret into an Applicative if an effectful pretty-printer is your idea of fun.

For wl-pprint-annotated, the functions in question are these, which take start, end, and pack functions:

displayDecorated ::
     Monoid o
  => (a -> o)   
  -> (a -> o)   
  -> (String -> o)  
  -> SimpleDoc a     
  -> o

displayDecoratedA ::
     (Applicative f, Monoid o)   
  => (a -> f o) 
  -> (a -> f o) 
  -> (String -> f o)    
  -> SimpleDoc a     
  -> f o     

I'm going to commit a computer crime for the first draft of this post, and just generate HTML as Text. The interface is clearly robust enough to render a well-formed document using something like Blaze or Lucid, and you should most definitely do that instead!

prettyAnnotated ::
     Program
  -> (Annotation -> Text) -- start
  -> (Annotation -> Text) -- end
  -> Text
prettyAnnotated p start end =
  displayDecorated start end T.pack (WL.renderPrettyDefault (ppProgram p))

programHtml :: Program -> Text
programHtml p =
  prettyAnnotated p
    (\case
      Punctuation ->
        "<span class=\"syntax-punctuation\">"
      Literal ->
        "<span class=\"syntax-literal\">"
      FreeIdentifier x ->
        "<a href=\"#" <> T.pack x <> "\"><span class=\"syntax-id\">"
      BoundIdentifier x ->
        "<span class=\"syntax-id\">"
      Binding x ->
        "<span class=\"syntax-parameter\">"
      DefinitionSite x ->
        "<span id=\"" <> T.pack x <> "\" class=\"syntax-def\">")
    (\case
      FreeIdentifier _ ->
        "</span></a>"
      _ ->
        "</span>")

Done! We get syntax highlighting and rich anchor links for free, more or less, by riding in the wake of the original printer. That's a pretty good outcome for a syntactic printer that knows nothing at all about the program's meaning.

Running after typechecking or after a series of internal transformations would give us a ridiculous amount of information to present to the user, perfect for editor integration or generating rich documentation.

Implementations

There are now a bunch of annotated printers on Hackage for us to choose from:

They all look alright to me. I'll be using this approach for all my pretty-printing going forward, since it costs so little.

History

David Christiansen wrote up this approach in 2014 , and presented A Pretty Printer that Says what it Means at Haskell Implementor's Workshop in 2015. David has made heavy use of this technique in presenting Idris syntax.

As David's slides demonstrate, some quite old Lisp environments provided much richer editing environments than we are accustomed to today. There's also a bit of recent prior art in the Agda highlighter, which runs after typechecking and present type information.