Haskell Project: Intro to Parsers

:: haskell, tutorial, parsec

Continuing with the todo.txt haskell project, this post will introduce parsers.

Previous project post: Unit Testing with Hspec.

Prep Work

This post is going to be a little more “theory” and explanation but I know I always like trying things out as I read so lets get the project updated to allow you to follow along.

The following changes will be needed to import Parsec into both the library and the unit test code:

library
  hs-source-dirs:     src
  exposed-modules:    Tasks
                    , Util
                    , Parser
  build-depends:      base >= 4.7 && < 5
                    , text
                    , parsec
  default-language:   Haskell2010

...

test-suite todo-test
  type:               exitcode-stdio-1.0
  hs-source-dirs:     test
  main-is:            Spec.hs
  build-depends:      base
                    , todo
                    , text
                    , hspec
                    , parsec
  ghc-options:        -Wall
  default-language:   Haskell2010

Next time we call stack build, stack test, or stack ghci, the Parsec libraries will be downloaded, configured and imported into our project.

Here is the final list of imports when the code is complete. We will go through them all as we introduce them. But to ease the walk through process we might as well get rid of any of the compile errors that come from missing imports.

module Parser where

import Control.Applicative (many)
import Data.Char (toUpper)
import Data.Text (pack)
import Text.Parsec.Char (char, oneOf, letter, alphaNum, digit, anyChar, noneOf,
                         endOfLine)
import Text.Parsec.Combinator (many1, optionMaybe, choice, option, eof,
                               optional, sepBy)
import Text.Parsec.Error as E
import Text.Parsec.Prim (parse, try, (<|>), parseTest, unexpected)
import Text.Parsec.Text

import qualified Tasks as Tasks

If you want links to the documentation you can check out the References.

Basic Layout of a Parser

There is a common data type pattern you’ll see in much of the Parsec library You can just view it as a “Parser Response”, ParsecT s u m a. Some of the unnamed fields will be replaced with specifics, but you’ll definitely see something close to this throughout the documentation.

The code we will be writing takes a stream of data and tokenizes it into haskell data structures, atoms. We will create a set of stream to data type parsers and combine them until eventually we can create a Task.

Deeper Explanation: ParsecT

Looking over the Parsec Documentation you’ll see the common structure ParsecT s u m a which breaks down pretty straight forward.

  • s is the stream type
  • u is the user state type
  • m is the underlying monad
  • a is the return type.

Given a stream (s) and executing in the current state (u), the parser performs a set of operations to extract data that creates type a. All of this occurs within a monad (m).

Understanding this isn’t really required to create a parser. Just knowing that ParsecT s u m a is the common data type in all parsers should be good enough as you will start to notice it in some form or another (typically replacing a letter for an actual type.

Often instead of using the transformer (denoted by the ‘T’) you’ll see the following type used:

type Parsec s u = ParsecT s u Identity

Using the Identity monad (docs) we get back exactly what type we pass in. And by using currying we can add the a return type to the end.

The type we will most commonly be using is (doc):

type Parser = Parsec String ()

Our stream type is String and we aren’t using any User State. So a type of Parser Int means String in and Int out. We can make all sorts of parsers with this type signature:

parseInt :: Parser Int            -- String in, Int out
parseChar :: Parser Char          -- String in, Char out
parsePriority :: Parser Priority  -- String in, Priority out

A simple example

Now that we know how the construct a parser by creating smaller atom parsers, and we know how the Parser data type works, let us create our first atom and walk through the parts.

We will start with a very simple example: Priority. Looking at Rule 1 of the todo.txt documentation,

The priority is an uppercase character from A-Z enclosed in parentheses and followed by a space.

Here is the code:

-- |Priority: (A)
priority :: Parser Tasks.Priority
priority = do
  char '('
  p <- letter
  char ')'
  return $ toUpper p

We will do the TDD work in a later post. But for now lets de-construct the parser for Priorities.

priority :: Parser Tasks.Priority

The type signature for the parser function is pretty straight forward. We are creating a parser (Parser a) with a return type of Tasks.Priority.

priority = do

Parser a is a monad, so to make the code easy to read we use the “do notation”. You could use all of the monadic operators and be very functional but I think that when it comes to code like a parser, readability is important. Using “do notation” you can write your parser in an imperative (procedural) way which many programmers already understand.

  char '('

This line reads in a specific character (doc). If it is found the value is returned (and we ignore it by not assigning it a name). If the next character in the stream doesn’t match, the parser fails.

  p <- letter

Similar to the previous line, this one reads a specific character. letter parse any “upper case or lower case character” and returns the parsed character. We assign that value to p

  char ')'

Again, parsing and ignoring a parenthesis.

  return $ toUpper p

Lastly we do a little house keeping, forcing the character to be upper case (doc), and then returning it. Since our function is of type Parser Tasks.Priority and Tasks.Priority is just a type of Char, everything matches up with the type.

Up Next

In the next post we will create some unit tests and figure out how chain a bunch of parsers together to create larger data structures.

References