- Composite datatypes
Not everything is just a primitive, of course, and we've actually already seen an example of a more
complex datatype in the previous chapter about "Values and functions": Bool
.
Bool
is actually defined exactly as we would define one of our own types:
data Bool = True | False
It can be either True
or False
and working with it is fairly instructive in terms of how one can
work with these kinds of data declarations:
import Prelude
add42or1337 :: Bool -> Int -> Int
-- Note that `if` is an expression and both branches need to return the same type. We also always
-- need the `else` branch for this reason.
add42or1337 shouldAdd42 x = x + if shouldAdd42 then 42 else 1337
We could also use pattern-matching via the case
keyword here to inspect the value of the bool:
import Prelude
add42or1337 :: Bool -> Int -> Int
add42or1337 shouldAdd42 x = x + case shouldAdd42 of
True -> 42
False -> 1337
Each case
branch can deconstruct the different constructors of a union type (which Bool
is),
even if they have associated data in them. We'll see this later.
We could also pattern-match "in the top-level", meaning on the left of the =
:
import Prelude
add42or1337 :: Bool -> Int -> Int
add42or1337 True x = x + 42
add42or1337 False x = x + 1337
If the logic for a function differs a lot between the cases I would personally prefer the last
version as it also allows you to have some of the arguments be bound only for certain cases, etc.,
and generally keeps each case separate. In this case the first version makes the most sense because
only the amount added depends on the boolean and we have special syntax for boolean values with
if
. Bool
values also work naturally with if
.
- Create a function that takes two functions of the type
(Bool -> Int)
as well as aBool
and applies one if theBool
isTrue
and the other if it'sFalse
. Define one version that does this withif
, one that does it withcase
and one that matches in the top-level.
In these examples you'll often find that there is a line under a lot of data definitions reading
deriving (Eq, Show)
. We'll look at what deriving
and the other components to this mean in the,
next chapter, but what you need to know right now is that this line will automatically generate the
capability for these types to be displayed the terminal, as well as be compared to eachother value
for value.
Before we move into advanced composite types, let's look at a very basic tool in the Haskell
toolbelt: newtype
.
Newtypes are types that wrap other types in order to make distinct versions of them. Let's look at what that means in practice.
Let's imagine a function filteredCopy
that takes a source filename, destination filename as well
as a string to filter by such that we only copy lines that contain that string:
filteredCopy :: String -> String -> String -> IO ()
filteredCopy source destination copyPattern = ...
The IO ()
for the purposes of this example means that we are doing something effectful and there
is no useful return value.
What happens if we by mistake use this function with filteredCopy destination source copyPattern
or any other incorrect order for the parameters? The type system doesn't know anything about what
these strings represent.
The solution is fairly simple:
newtype Source = Source String
deriving (Eq, Show)
newtype Destination = Destination String
deriving (Eq, Show)
newtype CopyPattern = CopyPattern String
deriving (Eq, Show)
filteredCopy :: Source -> Destination -> CopyPattern -> IO ()
filteredCopy source destination copyPattern = ...
When we use filteredCopy
now we will have to wrap our strings:
filteredCopy (Source source) (Destination destination) (CopyPattern copyPattern)
We can still make the mistake of wrapping our source
in a Destination
wrapper, to be clear, but
it's much easier to spot this mistake and if a value is produced in one place in a program as a
Destination
it simply cannot be passed blindly to a place where a Source
is required.
-
Define a newtype that wraps
Float
, calledMeters
. Define a function that takes twoMeters
and adds them together to return a newMeters
. -
Define two newtypes wrapping
Float
, calledMeters
andKilometers
. Define a function that takesMeters
and correctly converts them intoKilometers
. -
Define a newtype wrapping
String
that is calledUsername
, then a function that takes aUsername
and returns its length.
Records are useful when we want to store multiple values together in a named structure. The
individual parts, or "fields", are named as well. We begin a record definition with the keyword
data
after which we give the name of the type. Following an =
we then give the constructor for
the type; a function that takes the record data and constructs the type.
-- type constructor
data UserProfile = UserProfile
{ username :: String,
age :: Int,
active :: Bool,
interests :: [String]
}
deriving (Eq, Show)
The constructor name can be different than the type name, but this is comparatively rare.
One thing to note about record definitions in Haskell is that each field will have an associated function that takes the type and returns the field:
Q> :type username
username :: UserProfile -> String
Q> :type age
age :: UserProfile -> Int
Q> :type active
active :: UserProfile -> Bool
Q> :type interests
interests :: UserProfile -> [String]
Q> rickard = UserProfile {
username = "rickard",
age = 33,
active = True,
interests = ["Programming", "Problem Solving", "Teaching"]
}
Q> rickard
UserProfile
{ username = "rickard"
, age = 33
, active = True
, interests =
[ "Programming"
, "Problem Solving"
, "Teaching"
]
}
Q> username rickard
"rickard"
Q> age rickard
33
Q> active rickard
True
Q> interests rickard
[ "Programming"
, "Problem Solving"
, "Teaching"
]
We can see this in action in this snippet where we turn a profile into a presentation string:
profileToString :: UserProfile -> String
profileToString profile =
let ageString = show $ age profile
activeString = if active profile then "active" else "not Active"
interestsString = intercalate ", " (interests profile)
in mconcat
[ username profile,
" (",
ageString,
"y, ",
activeString,
") is interested in: ",
interestsString
]
intercalate :: String -> [String] -> String
intercalate between strings =
mconcat $ List.intersperse between strings
Running this on our previously defined profile we get:
Q> profileToString rickard
"rickard (33y, active) is interested in: Programming, Problem Solving, Teaching"
We could also pattern match on our UserProfile
type:
profileToString' :: UserProfile -> String
profileToString'
UserProfile
{ username = username,
age = age,
active = active,
interests = interests
} =
let ageString = show age
activeString = if active then "active" else "not Active"
interestsString = intercalate ", " interests
in mconcat
[ username,
" (",
ageString,
"y, ",
activeString,
") is interested in: ",
interestsString
]
We can see that we've now bound the values we care about in our function definition "head" and so the logic inside of the function is somewhat less busy, with values already being pulled out of our profile value for us. There is one issue, however; our function head is so wide that we've now been forced to spread it out over several lines, and we are repeating the field and variable names unnecessarily. When we are using the same name for a field we are matching as the name we are binding it to, we can use this nice shorthand:
profileToString :: UserProfile -> String
profileToString UserProfile {username, age, active, interests} =
let ageString = show age
activeString = if active then "active" else "not Active"
interestsString = intercalate ", " interests
in mconcat
[ username,
" (",
ageString,
"y, ",
activeString,
") is interested in: ",
interestsString
]
If we omit the =
in our bindings Haskell will assume we are binding the fields into a name equal
to the field's name. This mirrors the behavior you can find in, for example, JavaScript and other
languages and also applies when we construct records:
let userProfile =
-- Note how we don't have to pass all of these without `=`
UserProfile {username = "rickard", age, active, interests}
age = 33
active = True
interests = ["Programming" , "Problem Solving" , "Teaching"]
The userProfile
value above is a valid way to construct a UserProfile
. When we don't use =
for
a field and we have a field with the same name as our value, Haskell again assumes that we mean to
set the corresponding field to that value. Passing a field that doesn't exist in the type is still
an error, so this is completely safe.
This behavior with the shorthand deconstruction and construction is available through a language
extension called NamedFieldPuns
, that we by default enable in our Quanterall templates, so you
should see this work without issue when using them.
-
Define a function that takes a
String
and returns a datatype that stores both the length of the string and the string itself. -
Define a HTTP request datatype that has a
url
, a list of query parameters[0] and a HTTP method[1] and a body. -
Add
newtype
s to the definition you made for exercise 2 where you think they are appropriate.
- You may want to define a type for what a query parameter is.
- This is fine as a string for now.
While a record represents a collection of values that make up a whole, all of them present, a
union type represents a set of alternatives that are all valid, but only one at a time. The
built-in Bool
type is a union type; we can only have either True
or False
.
We define a union type with the data
keyword followed by the type name and =
. Then we list the
constructors of the type with |
between them:
data RelationshipStatus
= MarriedTo MarriageInfo -- This could also be `MarriedTo String Day`
| EngagedTo UserProfile
| ItsComplicated
| Single
deriving (Eq, Show)
data MarriageInfo = MarriageInfo {spouse :: String, date :: Day}
deriving (Eq, Show)
The different constructors all represent different cases and contain different data. In the case of
MarriedTo
the constructor holds a record. If we wanted to we could take several arguments to the
constructor, but have elected to name the components because it can sometimes be clearer to take a
record. In the case of EngagedTo
it's perfectly clear that the user is engaged to another user
profile. For the subsequent cases the constructors don't carry any additional data.
We can inspect and act on this data in several ways:
-- Note how we can put an underscore alone or before some text here to say that we do not care what
-- the contents actually are, but we are saying that yes, there is a value there.
isSingle :: RelationshipStatus -> Bool
isSingle (MarriedTo _) = False
isSingle (EngagedTo _userProfile) = False
isSingle ItsComplicated = True
isSingle Single = True
isSingle' :: RelationshipStatus -> Bool
isSingle' status = case status of
MarriedTo _marriageInfo -> False
EngagedTo _ -> False
ItsComplicated -> True
Single -> True
The above functions are of course very course grained; it's a very binary thing. To accurately represent what is actually the case we sometimes need to introduce more choices. Let's define a type that is perhaps more accurate:
data IsSingle
= DefinitelySingle
| MaybeSingle
| DefinitelyNotSingle
deriving (Eq, Show)
isSingle :: RelationshipStatus -> IsSingle
isSingle (MarriedTo _) = DefinitelyNotSingle
isSingle (EngagedTo _userProfile) = DefinitelyNotSingle
isSingle ItsComplicated = MaybeSingle
isSingle Single = DefinitelySingle
isSingle' :: RelationshipStatus -> IsSingle
isSingle' status = case status of
MarriedTo _marriageInfo -> DefinitelyNotSingle
EngagedTo _ -> DefinitelyNotSingle
ItsComplicated -> MaybeSingle
Single -> DefinitelySingle
When we introduce new types like this we enable our programs to more accurately model and act on the
information that flows through our systems. In the case above we've now enabled a choice for a
"maybe single" profile that previously had to be discerned from the ItsComplicated
constructor. We
now allow the user of isSingle
to be aware only of these three states that IsSingle
encompasses,
and not have to derive what to do based on the bigger, more information-dense RelationshipStatus
type.
-
Return to the
DivisionResult
data type andsafeDivide
function that we defined in chapter 1 and create a function that takes a defaultFloat
value as well as aDivisionResult
and if the division result is a division by zero, returns the default. Otherwise it returns the result. Create a solution with top-level pattern matching as well as one withcase
. -
Define a function
spouseName
that takes aRelationshipStatus
and returns aString
. Choose either top-level pattern matching or usingcase
. What do we do when a case does not have a spouse? -
Add a data type that more accurately reflects the having or not of a spouse and modify the function you defined in exercise 2 to return this data type. What happened to the cases where we do not have a spouse name to take from the relationship status?
-
Return to the HTTP request type we defined in the "Record types" exercise and more accurately model the "method" field[0].
-
Define a
TradeOrder
type that can be either aSellOrder
or aBuyOrder
, both taking aTickerSymbol
(anewtype
around aString
) and anInt
. Define a function that takes aTradeOrder
and a[TradeOrder]
and returns whether or not we matched a sell/trade to an existing opposite trade/sell in the list of orders. If there is a match, return the matching entry as well as the list of trade orders without the matched order[1]. If there isn't a match, indicate this in the return value.
- Remember that constructors can take payloads or not, so a method that has a body associated with it could take one and a method that doesn't could be designed to not take one.
delete
As we saw in the previous section it's trivial to combine records and unions; our MarriageInfo
type
is already embedded in the MarriedTo
constructor. So let's take that one step further and enrich
our UserProfile
data type:
import qualified Data.List as List
import Data.Time (Day)
import qualified Data.Time as Time
import Prelude
data UserProfile = UserProfile
{ username :: String,
age :: Int,
active :: Bool,
interests :: [String],
relationshipStatus :: RelationshipStatus
}
deriving (Eq, Show)
data RelationshipStatus
= MarriedTo MarriageInfo
| EngagedTo UserProfile
| ItsComplicated
| Single
deriving (Eq, Show)
data MarriageInfo = MarriageInfo {spouse :: String, date :: Day}
deriving (Eq, Show)
profileToString :: UserProfile -> String
profileToString profile =
let ageString = show $ age profile
activeString = if active profile then "active" else "not Active"
interestsString = intercalate ", " (interests profile)
relationshipStatusString = case relationshipStatus profile of
MarriedTo MarriageInfo {spouse, date} ->
let dateString = Time.showGregorian date
in -- `unwords` takes a `[String]` and joins them into a string with spaces inbetween
unwords ["Married to:", spouse, "on", dateString]
EngagedTo UserProfile {username} -> unwords ["Engaged to:", username]
ItsComplicated -> "It's complicated"
Single -> "Single"
in mconcat
[ username profile,
" (",
ageString,
"y, ",
activeString,
", ",
relationshipStatusString,
") is interested in: ",
interestsString
]
intercalate :: String -> [String] -> String
intercalate between strings =
mconcat $ List.intersperse between strings
If we now construct our rickard
profile with this in mind we get the following:
Q> rickard = UserProfile {
username = "rickard",
age = 33,
active = True,
interests = ["Programming", "Problem Solving", "Teaching"],
relationshipStatus = MarriedTo MarriageInfo {
spouse = "Ivana",
date = Time.fromGregorian 2015 06 04
}
}
Q> profileToString rickard
"rickard (33y, active, Married to: Ivana on 2015-06-04) is interested in: Programming, Problem
Solving, Teaching"
Combining record types and union types is the basis for domain modelling in Haskell and allows us to construct very rich datatypes that we can work safely with. Armed with more knowledge in the future it will be easy for you to look at the above example and modify it according to hypothetical business logic and having those changes be reflected in our functions.
As an example, if we decided that our marriage information should hold a UserProfile
, that would
require users on our site to only be able to set their "married" status if their spouse is on the
site. However, if we instead make the spouse
field take a type that allows us to have a name
or a userprofile, we can express this possibility clearly:
data MarriageInfo = MarriageInfo {spouse :: Spouse, date :: Day}
deriving (Eq, Show)
data Spouse
= SpouseProfile UserProfile
| SpouseName String
deriving (Eq, Show)
This will give us the same capability as before, because we still support spouse names with strings:
Q> rickard = UserProfile {
username = "rickard",
age = 33,
active = True,
interests = ["Programming", "Problem Solving", "Teaching"],
relationshipStatus = MarriedTo MarriageInfo {
spouse = SpouseName "Ivana",
date = Time.fromGregorian 2015 06 04
}
}
Q> profileToString rickard
"rickard (33y, active, Married to: Ivana on 2015-06-04) is interested in: Programming, Problem
Solving, Teaching"
But we can now also use a user profile in our spouse
field:
Q> ivana = UserProfile {
username = "ivana",
age = 31,
active = True,
interests = ["Web Design", "Cats", "Beer"],
relationshipStatus = MarriedTo MarriageInfo {
spouse = SpouseProfile rickard,
date = Time.fromGregorian 2015 06 04
}
}
Q> profileToString ivana
"ivana (31y, active, Married to: rickard on 2015-06-04) is interested in: Web Design, Cats, Beer"
Since we are now using the SpouseProfile
constructor for spouse
we can pass our previously
defined value rickard
of type UserProfile
and it still works. If we were building this site we
could use the fact that we are guaranteed to have another UserProfile
in the SpouseProfile
case
to insert a link to the other profile in this case and only output a string with the name in the
other.
For another example of modelling (part of) a domain with types, see this file.
- Modify the
EngagedTo
constructor inRelationshipStatus
to also take aSpouse
and then modify the latest version ofprofileToString
accordingly.
The most basic generic datatype is a type that can hold anything:
data Holder a = Holding a
deriving (Eq, Show)
Note how we now have a type variable on the left side of =
which means that when we refer to the
type it will take a type name. If we were holding a Int
, for example, the type is Holder Int
.
The corresponding constructor call (or pattern match) is Holding value
.
This basic type doesn't have much going for it in terms of functionality, but it's useful to show how we express type variables in data types. Fortunately for us we only need to extend the record and union definitions with the same parts as we are using in this basic one in order to get generic versions of them.
If we were to define a generic record, for example, we could just do the following:
data HttpResponse a = HttpResponse
{ status :: HttpStatus,
headers :: [HttpHeader],
body :: a
}
We can see here that our HTTP Response is generic over different type of body types. This means we
can construct it with different types and still get the expected structure for the rest of the
response. Since only the body will differ here we are free to say that it could be a JSON value,
bytestring or maybe UTF8 text. Much like our Holder
example this would mean that when we refer to
the type we would have, for example, HttpResponse JSONValue
, HttpResponse ByteString
or
HttpResponse UTF8Text
.
With unions we predictably have the same format for generic unions as we do for basic ones and we will go through several popular and representative definitions below.
-
Define a value of type
Holder Int
as well as a value of typeHolder String
. -
Define a function
mapHolder :: (a -> b) -> Holder a -> Holder b
that applies the passed in function to the value inside theHolder
and wraps it up again. After implementing it, try creating different concrete types likeHolder Int
and passing matching arguments to the function you wrote. As an example, try passinglength
and aHolder [Int]
to the function and see what comes out. -
Define a function
foldHolder :: (a -> b) -> Holder a -> b
. What is the most natural way to implement this function? -
Add a constructor to the
Holder
type that has no arguments and is namedNoValue
. What can we return in ourmapHolder
function in the case where theHolder
isNoValue
? Likewise, what do we need to do in order to makefoldHolder
compile again?
It's usually very instructive to look at some of the supplied composite datatypes that one can find in most Haskell code, so here is an introduction to their definitions and some of their use cases.
Maybe a
is a type that represents the existence or non-existence of a value of type a
. It can be
seen as a null
type that retains its type even in the presence of nesting. It's defined as
follows:
data Maybe a
= Nothing
| Just a
Note that we now have a type parameter on the left side of the =
and that one of our constructors
takes this a
.
Maybe
is useful generally speaking wherever you would have the type null | SomeType
, which means
that we generally would like to use it for success/failure when we don't have any interesting error
information. It's also useful for when certain options aren't supplied to a call, or information is
not available and that's fine.
You may want to replace it with what is effectively the same type but with more descriptive names, but this should be done on a case-by-case basis.
We could for example imagine that a resource is not loaded yet in some data, and this could be
represented with a Maybe Resource
, but we could also create the following datatype:
data ResourceLoadStatus
= NotYetLoaded
| Loaded Resource
This is all a matter of what makes things clear. Do you want to be able to use the functions that
already exist for Maybe
, perhaps? In that case it's reasonable to keep it as a Maybe
. If a
custom type makes things much clearer when looking at your data structures, then go for it. It's not
a complicated endevour to define the functions you need for this type and in the case of a resource
load status it's likely the case that you're pattern matching to see if you need to load the
resource, or just want to display "N/A" when it's not loaded.
If we wanted to convert a ResourceLoadStatus
to a Maybe Resource
, we could do the following:
data ResourceLoadStatus
= NotYetLoaded
| Loaded Resource
deriving (Eq, Show)
newtype Resource = Resource String
deriving (Eq, Show)
resourceLoadStatusToMaybe :: ResourceLoadStatus -> Maybe Resource
resourceLoadStatusToMaybe NotYetLoaded = Nothing
resourceLoadStatusToMaybe (Loaded resource) = Just resource
-
Define a type called
User
that has a username, e-mail address, maybe has a full name, and maybe has a telephone number. Make newtypes for each of these. -
Define a function that gets a telephone number string from a
User
. If theUser
has no telephone number, return"N/A"
. Use pattern matching in the top-level to accomplish this. -
Reimplement the previous function for getting a telephone number string from a
User
, but use themaybe
function[0]. If theUser
has no telephone number, return"N/A"
.
Either l r
is a type that represents either the error case Left
with information attached to it
or the success case Right
with success data attached to it. The reason Left
is always the error
case is because of how Either
works in a monadic context. When returning Left
it is considered
an error. It's also because Left
is not right, meaning it's "wrong".
data Either e a
= Left e
| Right a
Either
is useful when we want something that can either succeed or fail but we also want to bundle
some information into our error case. It's common to return a custom error type for the Left
case
that can be inspected to see what kind of error that was hit and any extra information that is
attached.
This, again, can be specialized down to something custom but still retain the same meaning:
data ResourceLoadResult
= LoadFailure ResourceLoadError
| LoadSuccess Resource
deriving (Eq, Show)
newtype Resource = Resource String
deriving (Eq, Show)
data ResourceLoadError
= ResourceBusy
| ResourceAccessDenied
| ResourceHasBadData
| UnknownResourceError String
deriving (Eq, Show)
The above definition holds the same information as a Either ErrorData Resource
, but can be more
descriptive in certain contexts.
-
Define a function that takes a
DivisionResult
[0] and turns it into aEither String Float
. -
Define a function that takes a
ResourceLoadResult
and turns it into aEither ResourceLoadError Resource
. -
Define a function that takes a
ResourceLoadResult
, uses your previous function for turning it into anEither ResourceLoadError Resource
and then pipes the result intoeither
to return aString
.
A tuple is an ad-hoc collection of values that can be of different types. Tuples are a staple of many so called functional languages and Haskell is no exception:
tuple :: (Int, String)
tuple = (42, "Forty-Two")
tuple' :: (Int, String, Bool)
tuple' = (42, "Forty-Two", False)
tuple'' :: (Int, String, Bool, Float)
tuple'' = (42, "Forty-Two", False, 1337.0)
In many ways a tuple is the same as a record, except we are not naming the different components or the constructor. Consequently it's useful when names for the components or the constructor aren't needed because the scope of the created value is small or the names themselves would not be deemed useful. The utility of tuples should be examined on a case-by-case basis to ensure that they don't make code harder to understand because of their lack of information/context. A name for both a constructor and the individual fields/components can in many cases be very illuminating.
List
/ []
is interesting because it's defined in terms of operators:
data [] a
= []
| a : [a]
So we have a type, called []
that takes an a
. The constructors are []
itself, which is the
empty list, and :
which as the left argument takes an a
and as the right argument takes another
list, [a]
. This means that a list is effectively the :
operator applied over and over until it
is connecting to a []
, which marks the end of the list:
Q> 1 : 2 : 3 : 4 : []
[ 1
, 2
, 3
, 4
]
Defined another way we have the following:
data List a
= EmptyList -- This is commonly called `Nil`
| Prepend a (List a) -- This is commonly called `Cons`
Lists are useful any time you need to have zero or more of something. It's important to note that Haskell lists are pointers to pointers to pointers, which can be quite inefficient in many situations.
We've seen many functions so far that have been operating on lists, but we have yet to work with them with pattern matching. If we want to examine a list in similar ways to our other data we can do so using the same tools we would otherwise:
maybeFirstElement :: [a] -> Maybe a
maybeFirstElement (a : _) = Just a
maybeFirstElement [] = Nothing
maybeFirstElement' :: [a] -> Maybe a
maybeFirstElement' list = case list of
a : _ -> Just a
[] -> Nothing
maybeFirstTwoElements :: [a] -> Maybe (a, a)
maybeFirstTwoElements (a : b : _) = Just (a, b)
maybeFirstTwoElements [] = Nothing
maybeFirstTwoElements' :: [a] -> Maybe (a, a)
maybeFirstTwoElements' list = case list of
a : b : _ -> Just (a, b)
[] -> Nothing
maybeFirstAndRest :: [a] -> Maybe (a, [a])
maybeFirstAndRest (a : rest) = Just (a, rest)
maybeFirstAndRest _anyOtherCase = Nothing
-- We can also match to an exact structure of a list
maybeExactlyTwoElements :: [a] -> Maybe (a, a)
maybeExactlyTwoElements [a, b] = Just (a, b)
maybeExactlyTwoElements _anyOtherCase = Nothing
-
Define a function that takes a
[Int]
and divides the first element by the sum[0] of the rest of the list. If the sum of the "tail" (rest) is 0 or there are no elements in the list, returnNothing
. -
Define a function that takes a
[a]
and returns aMaybe [a]
where the returned list is the tail of the list. Consider what to return if the list is empty. -
Define an
average
function that takes a[Int]
and returnsMaybe Float
where the return value is the average value. -
Define a function
maybeMaximumInt :: [Int] -> Maybe Int
function that takes a list of integers and finds the maximum integer of the list using the functionfoldr
[1]. Figure out what to pass as the "empty list" argument. -
Define a function
maximumInt :: Int -> [Int] -> Int
function that takes a default value and a list of integers, then either returns the default value or the found maximum value. Use the function you defined in exercise 4 together withmaybe
. -
Define a function
firstMatch :: (a -> Bool) -> [a] -> Maybe a
that returns the first element in a list that matches a given predicate, orNothing
otherwise. Try to create a version that usesfoldr
and one that recursively goes through the list with pattern matching. -
Define a function
firstMatchOr :: (a -> Bool) -> a -> [a] -> a
that uses thefirstMatch
function together withmaybe
to provide a default value unless we find a matching element.
When reading (and subsequently writing) Haskell code you will likely stumble upon type definitions
that have exclamation marks (!
) right before type names:
data Tuple = Tuple !Int String
deriving (Eq, Show)
-- This takes the string from our tuples
stringFromTuple :: Tuple -> String
stringFromTuple (Tuple _ string) = string
The exclamation mark before the Int
here is a strictness annotation. Let's look at how this
affects the behavior of our constructor first and then see why that is. First let's evaluate an
error to see what an crash looks like in our REPL:
Q> error "CRASH"
*** Exception: CRASH
CallStack (from HasCallStack):
error, called at <interactive>:24:1 in interactive:Ghci2
Q> crash = error "CRASH"
Q> crash
*** Exception: CRASH
CallStack (from HasCallStack):
error, called at <interactive>:25:5 in interactive:Ghci2
We first evaluate the error, then set crash
to be that expression. Evaluating crash
now reliably
causes the crash to happen.
Let's see how this strictness annotation seems to work out for our stringFromTuple
function:
Q> tuple = Tuple crash "hello"
Q> stringFromTuple tuple
*** Exception: CRASH
CallStack (from HasCallStack):
error, called at <interactive>:45:9 in interactive:Ghci1
So, we first bind tuple
to the expression that creates a Tuple
out of our crash value and
"hello". It might seem surprising to some, but consider that if you defined a function to crash
you'd also have to execute it in order to have it crash. After we've defined that we then use our
function that takes the string (which is just a normal value, no crash) out of the Tuple
and we
indeed see a crash happen.
Let's take a look at what happens when we remove our strictness annotation:
Q> tuple = Tuple crash "hello"
Q> stringFromTuple tuple
"hello"
We don't see a crash, even though the crashing value is plainly in the Tuple
. This happens because
Haskell defaults to lazy/non-strict evaluation, meaning it will only actually evaluate expressions
when they are needed by something else. Printing our Tuple
to the terminal evaluates our crash
immediately so we might not notice this distinction in that case, but creating a function that only
uses parts of a structure can find these distinctions much more easily.
What we are doing when we add the strictness annotation to Int
in our crashing example is say that
we want this piece of data to be fully evaluated when the structure itself is evaluated, even in the
case where the default behavior would not evaluate it.
Since this behavior applies to all expressions in Haskell (we only evaluate what is needed), what does that mean for the actual execution of a program. In short, it means that every expression you define is actually really just a recipe/formula for whatever value it should be producing, represented as a function. If that function is never called, neither the expressions in it nor the value it should be returning will materialize at all.
What it also means is that we have to be conscious that we may be building up massive amounts of these functions, called "thunks", when we stitch together expressions. This is called a "space leak" and is a common theme in Haskell users' frustrations when debugging the behavior of their program.
Q> take 3 [1 :: Int, 2, 3, crash]
[ 1
, 2
, 3
]
Q> take 4 [1 :: Int, 2, 3, crash]
*** Exception: CRASH
CallStack (from HasCallStack):
error, called at <interactive>:59:9 in interactive:Ghci1
Here we can see that the list type in Haskell is lazy by default, and the values we get from it will only ever be evaluated if a function actually uses them. This also means that if I were to bind the second expression to a name, we will be holding on to an expression that will crash, unless we also happen to not use the 4th item of the list.
Lazyness is in Haskell for a reason, however, so it's important to also consider what we are getting out of this feature. The easiest thing to show is how infinite lists are easy to work with:
Q> take 5 [1..]
[ 1
, 2
, 3
, 4
, 5
]
What we are doing here is creating an infinite list of natural numbers ([1..]
) and then passing
that infinite list to take
, with which we are only asking for 5 numbers. The end result is that
we indeed only get 5 numbers, even though this infinite list expression is supposed to create an
infinite list... But since it's really only a description of how to create this infinite list, we
successively ask for more and more elements and eventually just stop asking, so at no point in time
does an infinite (or close to it) list exist.
This applies to many other similar contexts as well and is (in my personal opinion) a very nice feature to have, so much so that I'm convinced that the benefits outweigh the cost. I think it's a fair characterization that many things that in strict languages would be stack overflows transparently work the way you want them to in Haskell, so much so that we don't even notice the wins. Code in PureScript, a Haskell-like language that compiles to JavaScript, always has to consider whether or not something is stack safe and these considerations permeate the language. In a language based entirely on the composition of functions this can be quite a hassle sometimes.
We can also annotate expressions with exclamation marks if we enable the BangPatterns
extension,
which enables us to say that an expression we are binding to a name should be evaluated. We can also
use seq
and deepseq
to force evaluation of an expression, one level deep or completely,
respectively.
Michael Snoyman from FP Complete has written a good article on lazyness and strictness annotations that is worth reading.