Haskell Project: Show, Compare, and Filter

:: haskell, tutorial

Now that we have a few basic types we should start working on making their interaction nicer. It is pretty important to be able to print and compare data types so we will start there. Once we are able to compare, we should be able to filter lists of data types.

If you didn’t read the previous post catch up on Stack and Data Types. Next post introduces Unit Testing.

About Data Structures

In the previous post we created a few very basic data types using the type and data keywords. There are three data type keywords and we will quickly cover them here.

Data Type Keyword: type

-- |Name is a string
type Name = String

-- |Age is an Int
type Age = Int

type is the simplest data type keyword. It creates an alias to an already existing type. In the code above we have created a type Name that is really just a String. Age is really just an Int. Both types can be interchanged with their aliased types and can use functions defined for their originating types.

>>> let x = "Joe" :: Name
>>> :t x
x :: Name
>>> x
"Joe"
>>> length x
3

Since we aren’t doing anything extremely interesting with Name, having it defined as an alias to String makes a lot of sense since its mostly for making the code more readable.

>>> let nameLength :: Name -> Int
        nameLength = length
>>> :t nameLength
nameLength :: Name -> Int
>>> nameLength x
3

Since length works on Foldable class types and String is a Foldable then our alias works with length as well. (I’ll explain type classes later in the post.) Our simple little method to calculate the length of a name is all just syntactic sugar to make the code more readable.

One thing to note though. When creating data types using type, type safty is enforced only allowing the casting to work from dirived type to base type. For example, lets say we create two types, GivenName and SirName:

type GivenName = String
type SirName = String

And we create a method that prints the full name with the following signature:

printName :: GivenName -> SirName -> IO ()

Even though both data types are aliases to String you cannot pass a SirName for the argument where a GivenName is expected.

Data Type Keyword: data

The next data type keyword we used was data. This keyword describes a structure with

  • One or more constructors
  • Each constructor contains zero or more fields

In the example we will create a Person data type that has a single constructor and takes in a Name and an Age.

-- |Person: Contains a name (string) and an age (int)
data Person = Person Name Age

Another example is to create a data type that represents a department of an business. There are two different types of departments: Empty and Staffed. Both department types have a name but only one has a list of employees.

-- |Department: Two different versions
-- EmptyDepartment: Dept Name
-- StaffedDepartment: Dept Name and list of staff

data Department = EmptyDepartment Name
                | StaffedDepartment Name [Person]

Functions can now interact with the type Department but have two different versions to handle.

staffCountInDepartment :: Department -> Int
staffCountInDepartment (EmptyDepartment _) = 0
staffCountInDepartment (StaffedDepartment _ staff) = length staff

Using pattern matching we are able to create two different forms of the staffCountInDepartment function which handles an EmptyDepartment differently than a StaffedDepartment.

Here we also see the difference between the data type and the constructor strings. In the function signature we use Department since that is the type. In the implementation we use the constructor names EmptyDepartment and StaffedDepartment. In our Person example since there is only one constructor it is acceptable to have both the data type and constructor to have the same name.

Side Note: Pattern Matching and Undscore

Note in the pattern matching you can either fill the field with a name or use _ to ignore it. Haskell expects all arguments defined in functions and test expressions to be used somewhere in the code. In situations where the value is not needed, using the _ name tells Haskell that it can ignore the fact that you aren’t using the value.

Data Type Keyword: newtype

The one keyword not used yet is newtype. It is similar to data and type, just with a few odd rules.

  • Must create a new constructor (unlike type)
  • Can only have one constructor (unlike data)
  • Can only have a single field (like type)
  • Does not associate functions using similar types (unlike type)

From these rules newtype really is a new type. It contains the data of the type supplied but that is where the association between the new and old types end.

>>> newtype DeptName = DeptName String
>>> let y = DeptName "Accounting"
>>> length y
<interactive>:29:8:
    Couldn't match expected type `t0 a0' with actual type `DeptName'
    In the first argument of `length', namely `y'
    In the expression: length y

Here you see that even though DeptName is based on String the function length does not operate as expected. We would need to create our own new method to calculate the length.

deptNameLength :: DeptName -> Int
deptNameLength (DeptName n) = length n

Here we are deconstructing the new type, extracting the String from it and calling another method. While this seems a bit cumbersome, the benefit here is that it would not be possible to pass in Name or any other string type to this function.

>>> let x = DeptName "Accounting"
>>> :t x
x :: DeptName
>>> deptNameLength x
10
>>> let y = "Joe" :: Name
>>> :t y
y :: Name
>>> deptNameLength y
<interactive>:36:16:
    Couldn't match type `[Char]' with `DeptName'
    Expected type: DeptName
      Actual type: Name
    In the first argument of `deptNameLength', namely `x'
    In the expression: deptNameLength x

From the error we see that the function expected DeptName but instead was passed Name.

One good example of newtype is taking a query string and sanitizing it before processing. As a pointless example, lets say our language parser only used capitalized words.

newtype QueryString = QueryString String
newtype SanitizedString = SanitizedString String

sanitize :: QueryString -> SanitizedString
sanitize (QueryString str) = SanitizedString $ map Data.Char.toUpper str

Even though both types look the same, you cannot pass a SanitizedString into sanitize. If we used type this restriction wouldn’t be possible.

Note There are a few other distinguishing rules about newtype I haven’t covered. For the moment they aren’t too important but hopefully I’ll find a way to incorporate them to the application/blog.

Showing with the Show Type Class

A nice feature of haskell is being able to derive functionality from class types using the deriving keyword. Starting with our current implementation of Date we derive Show and get the following output.

>>> let d = (Date 2000 1 30)
>>> d
Date 2000 1 30
>>>

That is nice for basic development and debugging but what if we wanted to change the default way the Date type was shown. We could match the format expected by todo.txt, YYYY-MM-DD. We just need to overload the functions required by Show, namely show a.

data Date = Date Integer Int Int

-- |Show Date in YYYY-MM-DD format
instance Show Date where
  show (Date year month day) = show year
                               ++ "-" ++ show month
                               ++ "-" ++ show day

Now when we display the data type we get the format we expect.

>>> let d = (Date 2000 1 30)
>>> d
2000-1-30
>>>

And with that you’ve had your first introduction into Type Classes. Date is now an instance of the type class Show. If you want to see more information about the type class type :info Show into your GHCi instance.

>>> :info Show
class Show a where
  showsPrec :: Int -> a -> ShowS
  show :: a -> String
  showList :: [a] -> ShowS
        -- Defined in `GHC.Show'
instance Show a => Show [a] -- Defined in `GHC.Show'
instance Show Word -- Defined in `GHC.Show'
instance Show Ordering -- Defined in `GHC.Show'
instance Show a => Show (Maybe a) -- Defined in `GHC.Show'
instance Show Integer -- Defined in `GHC.Show'
instance Show Int -- Defined in `GHC.Show'
instance Show Char -- Defined in `GHC.Show'
instance Show Bool -- Defined in `GHC.Show'
...
<more entries than I care to show>

You can see what functions exist in the type class, and instances that exist already.

Note For those who noticed, this doesn’t exactly fit our format requirement. We’ll fix it in the next post.

A little more complicated example would be how to print out a Task. Some of the issues are:

  • Incomplete and Completed Tasks look different
  • Incomplete Tasks have optional Priority and Start Date
-- |Show Task in format "(A) 2016-07-30 Task to do +Project @Context"
instance Show Task where
  show (Completed date task) = "x " ++ (show date) ++ " " ++ (show task)
  show (Incomplete mPriority mDate _ _ str) = (showPriority mPriority)
                                              ++ (showDate mDate)
                                              ++ str
    where showPriority (Just p) = "(" ++ [p] ++ ") "
          showPriority Nothing = ""
          showDate (Just d) = show d ++ " "
          showDate Nothing = ""

The first two show lines use pattern matching to distinguish between Completed and Incomplete tasks. Since our completed task contains an incomplete task, we are able to invoke show recursively rather than handling the same data twice.

To handle the optional fields we create a couple of helper functions that use pattern matching (there is often a lot of pattern matching in Haskell) to return the appropriate string.

Comparing with Eq and Ord Type Classes

Our next two type classes deal with comparing values. The Eq type class handles equivalency between two values.

>>> :info Eq
class Eq a where
  (==) :: a -> a -> Bool
  (/=) :: a -> a -> Bool
        -- Defined in `GHC.Classes'

From the info on Eq we can see that there are two functions defined: (==) and (/=). The first take two instances of the same type and returns True if they are equal. The second function is for “not equals”.

Note the use of parenthesis on function names with symbols makes them prefix notation. (==) 1 2 is the same as 1 == 2.

Let implement the equality check for Date.

-- |Check if two Dates are equal
instace Eq Date where
  (Date y1 m1 d1) == (Date y2 m2 d2) = y1 == y2
                                       && m1 == m2
                                       && d1 == d2

Note Due to some magic in the type class, you do not need to implement (/=) if you don’t want. See the Hackage documentation on Eq for Minimal complete definition.

>>> (Date 2017 1 1) == (Date 2017 1 1)
True
>>> (Date 2017 1 1) == (Date 1999 12 31)
False

The next type class for comparing is Ord, which is used for ordered data types.

>>> :info Ord
class Eq a => Ord a where
  compare :: a -> a -> Ordering
  (<) :: a -> a -> Bool
  (<=) :: a -> a -> Bool
  (>) :: a -> a -> Bool
  (>=) :: a -> a -> Bool
  max :: a -> a -> a
  min :: a -> a -> a

The Ord type class is defined with a requirement that the type also be an instance of type class Eq. The class contains multiple functions but from the Hackage documentation of Ord we see that the minimal complete definition is to implement compare.

-- |Compare (LT,GT,EQ) two Dates
instance Ord Date where
  compare (Date y1 m1 d1) (Date y2 m2 d2) = if y1 /= y2
                                            then compare y1 y2
                                            else if m1 /= m2
                                                 then compare m1 m2
                                                 else compare d1 d2

Our compare function is pretty simple to understand. Walking through the Year, Month and Day, if any of them are not equivalent then we use the built in compare method.

Filtering

Now that we can compare our data types, we can filter lists of data types.

The filter function has the type definition:

filter :: (a -> Bool) -> [a] -> [a]

The first argument is a function that takes a value and returns a Bool, True if the value should be kept in the list. If we wanted to filter a list of Tasks and only return the Incomplete we could use filter.

-- |Filter out all completed tasks and return a list of incomplet
onlyPending :: [Task] -> [Task]
onlyPending = filter isPending
  where isPending (Incomplete _ _ _ _ _) = True
        isPending _ = False

isPending returns True for Incomplete and False for everything else.

A useful function is to check if one list is a subset of another. Using filter we can create such a function.

module Util ( subsetOf ) where

-- |subsetOf filters a list that contains a set of elements
subsetOf :: (Eq a, Foldable t) => [a] -> t a -> Bool
xs `subsetOf` ys = null $ filter (not . (`elem` ys)) xs

To break this down we will start inside out.

(not . (`elem` ys))

Check if the argument is an element of ys, and invert the result.

filter (not . (`elem` ys)) xs

Running over xs each element is kept if it is not an element of ys.

null $ filter (not . (`elem` ys)) xs

Returns True if the list passed to null is empty.

If there are any values in xs that are not in ys they will be returned by filter and since the list passed to null would not be empty it returns False.

Using subsetOf we can filter for tasks that contain all the projects in a list.

-- |Filter based on Projects. Strings for project should not contain '+' as
-- defined by todo.txt
filterProjects :: [Project] -> [Task] -> [Task]
filterProjects px = filter projectFilter
  where projectFilter (Incomplete _ _ projs _ _) = px `subsetOf` projs
        projectFilter (Completed _ t) = projectFilter t

Up next

If you want to see all the modifications for this post you can diff the code here.

In the next post we will fix some of the bugs we introduced in this post by writing some Unit Tests.