If you've written a pretty-printer before, you might be familiar with libraries like
pretty . The APIs and underlying algorithms may be different, but both provide the user with a
Doc type and a few dozen combinators with which we construct a tree of layouts. That tree is then rendered into a string, using various layout-specific heuristics to produce a pleasing output.
This post is about the new cohort of annotated pretty-printing libraries. These extend the widely-used interfaces of Wadler-Leijen printers (like
wl-pprint) and Hughes-Peyton Jones printers (like
pretty) by allowing the user to attach arbitrary information to nodes in the layout tree, replacing
Doc with a higher-kinded type,
We can now smuggle information alongside our representation of program text, and decide how to turn it into strings later. That sounds like a small change, but it enables some really fun things. Essentially, it becomes much easier to reuse printers written for uninteresting plain-text for information-rich presentations.
This post will take a pretty-printer written in the WL style and extend it until it can be trivially used to produce rich HTML output. In particular, we'll add syntax tags and free variable markers to the
Doc as annotations, allowing us to apply syntax highlighting and generate hyperlinks between definitions.
Let's define a toy language for demonstration purposes.
This technique is useful for any format worth pretty-printing, including markup formats, DSLs, and regular programming languages.
This language is remarkably simple: we have expressions, numbers, lambda abstractions, variables, and declarations:
data Expr = ENum Int -- e.g. 41 | EVar String -- foo | ELam String Expr -- \x -> x | EApp Expr Expr -- id x deriving (Eq, Ord, Show) data Decl = Decl String Expr -- e.g. foo = 43 deriving (Eq, Ord, Show) newtype Program = Program [Decl]
(I'm a little hesitant to use a lambda calculus to illustrate, since not everyone is familiar with it, and may confuse the uninteresting parts of this post for its content. Please get in touch if you read this and would like a more concrete example, I'll likely have time to extend this post at some point.)
Let's first define a completely unannotated pretty-printer, suitable for use in a round-trip property like
forAll genProgram (\x -> parse (pretty x) == x). This is a common property used to obtain some confidence that a parser has been implemented correctly, or that a language has been soundly defined.
Note that this printer doesn't use many of the traditional pretty-printing bells and whistles. You can do a lot more with these libraries than I do; I largely stick to the
import Text.PrettyPrint.Annotated.WL (Doc, (<+>), (<$$>)) import qualified Text.PrettyPrint.Annotated.WL as WL pretty :: Program -> Text pretty = WL.displayT . WL.renderPrettyDefault . ppProgram -- | Pretty-print a program. -- -- This implementation does not use any annotations. ppProgram :: Program -> Doc a ppProgram (Program decls) = WL.vcat (fmap ppDecl decls) ppDecl :: Decl -> Doc a ppDecl (Decl name expr) = WL.hang 2 $ WL.text name <+> WL.char '=' <$$> ppExpr expr ppExpr :: Expr -> Doc a ppExpr expr = case expr of ENum x -> WL.text (show x) EVar x -> WL.text x ELam x f -> WL.char '\\' <> WL.text x <+> WL.text "->" <+> ppExpr f EApp f g -> parens (parens (ppExpr f) <+> parens (ppExpr g)) parens :: Doc a -> Doc a parens x = WL.char '(' <> x <> WL.char ')'
This buys us something like this:
foo = ((\x -> x) (45)) bar = ((\x -> foo) (53))
This example uses
wl-pprint-annotated, which is roughly equivalent to David Christiansen's
annotated-wl-pprint, but allows
Semigroup instance. This lets me use
foldMap instead of battling the traditional pack of
wl-pprint combinators. Besides that, it's a very similar library.
Let's chuck something vaguely informative into that
I'd like to tell the difference between bound and free variables, so that we can include anchor links to definitions. So, I'll be threading a Set around too.
import Data.Set (Set) import qualified Data.Set as S data Annotation = Punctuation | Literal | FreeIdentifier String | BoundIdentifier String | Binding String | DefinitionSite String -- | Pretty-print a program, with annotations. ppProgram :: Program -> Doc Annotation ppProgram (Program decls) = WL.vcat (fmap ppDecl decls) ppDecl :: Decl -> Doc Annotation ppDecl (Decl name expr) = WL.hang 2 $ WL.annotate (DefinitionSite name) (WL.text name) <+> punctuation '=' <$$> ppExpr mempty expr ppExpr :: Set String -> Expr -> Doc Annotation ppExpr bound expr = case expr of ENum x -> WL.annotate Literal (WL.text (show x)) EVar x -> if S.member x bound then WL.annotate (BoundIdentifier x) (WL.text x) else WL.annotate (FreeIdentifier x) (WL.text x) ELam x f -> punctuation '\\' <> WL.annotate (Binding x) (WL.text x) <+> WL.annotate Punctuation (WL.text "->") <+> ppExpr (S.insert x bound) f EApp f g -> parens (parens (ppExpr bound f) <+> parens (ppExpr bound g)) parens :: Doc a -> Doc Annotation parens x = punctuation '(' <> x <> punctuation ')' punctuation :: Char -> Doc Annotation punctuation = WL.annotate Punctuation . WL.char
As you can see, we haven't changed the program a whole lot! We've just added some fairly mundane semantic information, plus whatever was necessary to identify free variables. Our old pretty-printing function still does exactly the same thing, since it discards all the annotations. When we want to, though, we can interpret these annotations however we like.
Now, to make use of the annotated information.
Most of these libraries allow us to interpret the annotations (and indeed the
Doc itself) into some arbitrary
Monoid. We can interpret into an
Applicative if an effectful pretty-printer is your idea of fun.
wl-pprint-annotated, the functions in question are these, which take
displayDecorated :: Monoid o => (a -> o) -> (a -> o) -> (String -> o) -> SimpleDoc a -> o displayDecoratedA :: (Applicative f, Monoid o) => (a -> f o) -> (a -> f o) -> (String -> f o) -> SimpleDoc a -> f o
I'm going to commit a computer crime for the first draft of this post, and just generate HTML as Text. The interface is clearly robust enough to render a well-formed document using something like Blaze or Lucid, and you should most definitely do that instead!
prettyAnnotated :: Program -> (Annotation -> Text) -- start -> (Annotation -> Text) -- end -> Text prettyAnnotated p start end = displayDecorated start end T.pack (WL.renderPrettyDefault (ppProgram p)) programHtml :: Program -> Text programHtml p = prettyAnnotated p (\case Punctuation -> "<span class=\"syntax-punctuation\">" Literal -> "<span class=\"syntax-literal\">" FreeIdentifier x -> "<a href=\"#" <> T.pack x <> "\"><span class=\"syntax-id\">" BoundIdentifier x -> "<span class=\"syntax-id\">" Binding x -> "<span class=\"syntax-parameter\">" DefinitionSite x -> "<span id=\"" <> T.pack x <> "\" class=\"syntax-def\">") (\case FreeIdentifier _ -> "</span></a>" _ -> "</span>")
Done! We get syntax highlighting and rich anchor links for free, more or less, by riding in the wake of the original printer. That's a pretty good outcome for a syntactic printer that knows nothing at all about the program's meaning.
Running after typechecking or after a series of internal transformations would give us a ridiculous amount of information to present to the user, perfect for editor integration or generating rich documentation.
There are now a bunch of annotated printers on Hackage for us to choose from:
wl-pprint-annotated, used in this post
annotated-wl-pprint, pretty similar to the above, but closer to
pretty, where Trevor Elliott has contributed annotations upstream
wl-pprint-extras, Ed Kmett's version, built with a free monad
They all look alright to me. I'll be using this approach for all my pretty-printing going forward, since it costs so little.
David Christiansen wrote up this approach in 2014 , and presented A Pretty Printer that Says what it Means at Haskell Implementor's Workshop in 2015. David has made heavy use of this technique in presenting Idris syntax.
As David's slides demonstrate, some quite old Lisp environments provided much richer editing environments than we are accustomed to today. There's also a bit of recent prior art in the Agda highlighter, which runs after typechecking and present type information.