Continuing with the todo.txt haskell project, this post will introduce parsers.
Previous project post: Unit Testing with Hspec.
This post is going to be a little more “theory” and explanation but I know I always like trying things out as I read so lets get the project updated to allow you to follow along.
The following changes will be needed to import Parsec into both the library and the unit test code:
library hs-source-dirs: src exposed-modules: Tasks , Util , Parser build-depends: base >= 4.7 && < 5 , text , parsec default-language: Haskell2010 ... test-suite todo-test type: exitcode-stdio-1.0 hs-source-dirs: test main-is: Spec.hs build-depends: base , todo , text , hspec , parsec ghc-options: -Wall default-language: Haskell2010
Next time we call
stack test, or
stack ghci, the Parsec libraries will be downloaded, configured and imported into our project.
Here is the final list of imports when the code is complete. We will go through them all as we introduce them. But to ease the walk through process we might as well get rid of any of the compile errors that come from missing imports.
module Parser where import Control.Applicative (many) import Data.Char (toUpper) import Data.Text (pack) import Text.Parsec.Char (char, oneOf, letter, alphaNum, digit, anyChar, noneOf, endOfLine) import Text.Parsec.Combinator (many1, optionMaybe, choice, option, eof, optional, sepBy) import Text.Parsec.Error as E import Text.Parsec.Prim (parse, try, (<|>), parseTest, unexpected) import Text.Parsec.Text import qualified Tasks as Tasks
If you want links to the documentation you can check out the References.
Basic Layout of a Parser
There is a common data type pattern you’ll see in much of the Parsec library You can just view it as a “Parser Response”,
ParsecT s u m a. Some of the unnamed fields will be replaced with specifics, but you’ll definitely see something close to this throughout the documentation.
The code we will be writing takes a stream of data and tokenizes it into haskell data structures, atoms. We will create a set of stream to data type parsers and combine them until eventually we can create a
Deeper Explanation: ParsecT
Looking over the Parsec Documentation you’ll see the common structure
ParsecT s u m a which breaks down pretty straight forward.
sis the stream type
uis the user state type
mis the underlying monad
ais the return type.
Given a stream (
s) and executing in the current state (
u), the parser performs a set of operations to extract data that creates type
a. All of this occurs within a monad (
Understanding this isn’t really required to create a parser. Just knowing that
ParsecT s u m a is the common data type in all parsers should be good enough as you will start to notice it in some form or another (typically replacing a letter for an actual type.
Often instead of using the transformer (denoted by the ‘T’) you’ll see the following type used:
type Parsec s u = ParsecT s u Identity
Identity monad (docs) we get back exactly what type we pass in. And by using currying we can add the
a return type to the end.
The type we will most commonly be using is (doc):
type Parser = Parsec String ()
Our stream type is
String and we aren’t using any User State. So a type of
Parser Int means
String in and
Int out. We can make all sorts of parsers with this type signature:
parseInt :: Parser Int -- String in, Int out parseChar :: Parser Char -- String in, Char out parsePriority :: Parser Priority -- String in, Priority out
A simple example
Now that we know how the construct a parser by creating smaller atom parsers, and we know how the
Parser data type works, let us create our first atom and walk through the parts.
We will start with a very simple example: Priority. Looking at Rule 1 of the todo.txt documentation,
The priority is an uppercase character from A-Z enclosed in parentheses and followed by a space.
Here is the code:
-- |Priority: (A) priority :: Parser Tasks.Priority priority = do char '(' p <- letter char ')' return $ toUpper p
We will do the TDD work in a later post. But for now lets de-construct the parser for Priorities.
priority :: Parser Tasks.Priority
The type signature for the parser function is pretty straight forward. We are creating a parser (
Parser a) with a return type of
priority = do
Parser a is a monad, so to make the code easy to read we use the “do notation”. You could use all of the monadic operators and be very functional but I think that when it comes to code like a parser, readability is important. Using “do notation” you can write your parser in an imperative (procedural) way which many programmers already understand.
This line reads in a specific character (doc). If it is found the value is returned (and we ignore it by not assigning it a name). If the next character in the stream doesn’t match, the parser fails.
p <- letter
Similar to the previous line, this one reads a specific character. letter parse any “upper case or lower case character” and returns the parsed character. We assign that value to
Again, parsing and ignoring a parenthesis.
return $ toUpper p
Lastly we do a little house keeping, forcing the character to be upper case (doc), and then returning it. Since our function is of type
Tasks.Priority is just a type of
Char, everything matches up with the type.
In the next post we will create some unit tests and figure out how chain a bunch of parsers together to create larger data structures.