Lecture 13 - Lecture notes (#30)

aicenter · May 20, 2024 · d9590b0 · d9590b0
1 parent ef36a68
commit d9590b0
Show file tree

Hide file tree

Showing 7 changed files with 428 additions and 4 deletions.
diff --git a/.vitepress/config.mts b/.vitepress/config.mts
@@ -42,6 +42,7 @@ export default withMermaid(
             { text: '10: IO & Monads', link: '/lectures/lecture10'},
             { text: '11: Monadic Parsing', link: '/lectures/lecture11'},
             { text: '12: State Monad', link: '/lectures/lecture12'},
+            { text: '13: Monoids & Foldables', link: '/lectures/lecture13'},
             { text: '14: Parallel Programming', link: '/lectures/lecture14'},
           ]
         },

diff --git a/lectures/index.md b/lectures/index.md
@@ -95,16 +95,17 @@ We discuss some more examples of type classes, most importantly `Functor`s.
 [`State.hs`](https://github.com/aicenter/FUP/blob/main/code/State.hs).
 [`StateIO.hs`](https://github.com/aicenter/FUP/blob/main/code/StateIO.hs).
 
-## Lecture 13: Monoids & Foldables
-
-Lecture notes coming soon!
+## [Lecture 13](lecture13): Monoids & Foldables
 
 [Slides](https://github.com/aicenter/FUP/blob/main/lectures/lecture13.pdf).
 [Log](https://github.com/aicenter/FUP/blob/main/code/lecture13.hs).
 [Dataset](https://github.com/aicenter/FUP/blob/main/code/FUP-hw.csv).
 
 
-## [Lecture 14](lecture14): Parallel Haskell
+## [Bonus Lecture](lecture14): Parallel Haskell
+
+Introduces Haskell's spark system and demonstrates how to use `Strategy` types for simple
+parallelization of existing Haskell programs.
 
 [`pfold.hs`](https://github.com/aicenter/FUP/blob/main/code/pfold.hs).
 [`parmaze.hs`](https://github.com/aicenter/FUP/blob/main/code/parmaze.hs).

diff --git a/lectures/lecture13.md b/lectures/lecture13.md
@@ -0,0 +1,349 @@
+---
+outline: deep
+---
+# Monoids & Foldables
+
+The _**fold**_ operation is one of (if not *the*) most important construction in functional
+programming. An example we have seen very often already is using `foldr` to sum a list of numbers:
+
+```haskell
+𝝺> sum = foldr (+) 0
+𝝺> sum [1,2,3]
+6
+```
+
+But there are many more things we can do with a fold! Another example is to define `and`:
+```haskell
+𝝺> and = foldr (&&) True
+𝝺> and [True,True,True]
+True
+
+𝝺> and [True,False]
+False
+```
+An perhaps a tiny bit more interesting, counting the number of a specific element in a list:
+```haskell
+𝝺> count e = foldr (\x acc -> if e==x then acc+1 else acc) 0
+𝝺> count 2 [1,2,1,2,2,3]
+3
+```
+Arguably the most advanced example we have seen of a fold is the monadic fold of mazes in `setPath`
+of [Lab 12](/labs/lab12#manipulations-with-maze).
+
+Importantly, we can implement a number of useful functions _**in terms of fold**_, so theoretically,
+we don't need much more than a datastructure being foldable. For example:
+```haskell
+length = foldr (\_ -> (+1)) 0
+map f = foldr ((:) . f) []
+```
+
+:::  tip Fold: Aggregation & traversal
+If we take a look at the type signature of `foldr` we see that it contains a `Foldable` type
+constraint:
+```haskell
+foldr :: Foldable t => (b -> a -> b) -> b -> t a -> b
+```
+In this lecture we will explore the essence of this `Foldable` typeclass and pick at the different
+parts that make a fold. Conceptually, there are two parts to folding:
+1. The _**aggregation**_ - represented by the function `b -> a -> b`. We will use an abstraction
+    called a [`Monoid`](#aggregation-semigroups-monoids) to treat this part of the fold separately.
+2. The _**traversal**_ - which walks over the foldable datastructure `t a`. This is what the [`Foldable`](#traversal-foldables) typeclass is doing.
+:::
+
+## Aggregation: Semigroups & Monoids
+
+Before we get to monoids which represent the aggregation part of a fold, we will define a semigroup.
+A _**semigroup**_ is an algebra with a *domain* and a *binary, associative operation*. For example,
+addition on the natural numbers forms a semigroup: The domain $\mathbb N$ with the operation $+$
+satisfies associativity: $a+(b+c) = (a+b)+c$.
+
+Formally, we define a *semigroup* $\langle S, \diamond\rangle$ as a set $S$ endowed with a
+binary operation $\diamond$ that satisfies
+
+$$ a \diamond (b \diamond c) = (a \diamond b) \diamond c. $$
+
+A _**monoid**_ $\langle M, \diamond, u \rangle$ is a semigroup with a *unit* $u \in M$ that satisfies
+
+$$ u \diamond a = a = a \diamond u. $$
+
+In other words, a monoid has an identity element (e.g. for $+$ this element would be $0$).
+
+Some examples of monoids are:
+
+- $\langle \mathbb N, +, 0 \rangle$ - Addition of natural numbers
+- $\langle \mathbb N, \times, 1 \rangle$ - Multiplication of natural numbers
+- $\langle$ `[a]`, `++` , `[]` $\rangle$ - Lists and concatenation
+- $\langle A^A, \circ, \text{id} \rangle$ - Selfmaps $f:A\rightarrow A$ form a monoid under
+    composition $\circ$.
+
+In Haskell, the typclass `Semigroup` defines an operation `<> :: a -> a -> a` (i.e. binary operation
+that takes two elements of type `a` and produces another such element).
+For lists we can implement semigroup simply with `++`:
+```haskell
+class Semigroup a where
+  (<>) :: a -> a -> a -- assumed to be associative
+
+-- list is a semigroup
+instance Semigroup [] where
+  (<>) = (++)
+
+> [1,2,3] <> [4,5,6]
+[1,2,3,4,5,6]
+```
+
+The `Monoid` typeclass adds the identity element `mempty`, which for the list monoid is of course
+`[]`.
+```haskell
+class Semigroup a => Monoid a where
+  mempty :: a
+
+  mconcat :: [a] -> a
+  mconcat = foldr (<>) mempty
+
+  mappend :: a -> a -> a
+  mappend = (<>)
+
+instance Semigroup [] where
+  mempty = []
+```
+
+For `Int`s we already noticed that we can have multiple monoids. To define a monoid over addition we
+therefore need a new type
+```haskell
+newtype Sum a = Sum {getSum :: a}
+
+instance Num a => Semigroup (Sum a) where
+  (<>) = (+)
+  stimes n (Sum a) = Sum (fromIntegral n * a)
+
+instance Num a => Monoid (Sum a) where
+  mempty = Sum 0
+
+𝝺> (Sum 7) <> (Sum 4)
+Sum {getSum = 11}
+```
+
+::: tip *WHY?!*
+Great question. Why should we jump through the hoops of *defining another type for addition*?!
+1. *Abstraction*. Remember, monoids let us separate the aggregation part of a fold. This is useful
+   because we only need to define `<>` for a new type and we can immediately fold e.g. over lists,
+   trees, and anything that's foldable.
+2. Semigroups give us *associativity*, which we can use to our advantage. For example, we can
+   evaluate large expressions of `<>` in *any order*. This means, for example, that we can execute
+   huge folds in a distributed fashion:
+
+Assume that `<>` is an operation that is much more expensive than a simple `+`, then we can execute
+the first `(...)` on a different process/device and accumulate afterwards without having to worry
+about correctness.
+```haskell
+(a <> b <> c) <> (d <> e <> f)
+```
+:::
+
+
+### Simple examples of monoids
+
+`Any` (resp. `All`) is the disjunctive (resp. conjunctive) monoid on `Bool`:
+
+```haskell
+𝝺> (Any False) <> (Any True) <> (Any False)
+Any {getAny = True}
+```
+
+For a monoid `m` its dual monoid is `Dual m`
+```haskell
+𝝺> (Dual "a") <> (Dual "b") <> (Dual "c")
+Dual {getDual = "cba"}
+```
+
+Product of monoids:
+```haskell
+𝝺> (Sum 2,Product 3) <> (Sum 5,Product 7)
+(Sum {getSum = 7},Product {getProduct = 21})
+```
+
+
+### Advanced examples of monoids
+`Map` is a monoid under `union`:
+```haskell
+𝝺> Map.fromList [(1,"a")] <> Map.fromList [(1,"b")] <> Map.fromList [(2,"c")]
+fromList [(1,"a"),(2,"c")]
+```
+where `<> = Map.union` is a *left-biased* union of keys (meaning, the left-most argument with the
+same key will override the ones further to the right).
+
+We could implement another monoid instance for `Map`, which instead of overwriting recurring keys,
+accumulates the corresponding values. For this we need a new type we can call `MMap`:
+```haskell
+newtype MMap k v = MMap (Map.Map k v)
+
+fromList :: Ord k => [(k,v)] -> MMap k v
+fromList xs = MMap (Map.fromList xs)
+
+instance (Ord k, Monoid v) => Semigroup (MMap k v) where
+  (MMap m1) <> (MMap m2) = MMap (Map.unionWith mappend m1 m2)
+```
+By defining `<>` via the `unionWith` function and `mappend` (monoidal append) we can
+accumulate any `MMap` that has values which are instances of `Monoid`:
+
+```haskell
+𝝺> fromList [(1,"a")] <> fromList [(1,"b")] <> fromList [(2,"c")]
+MMap (
+ 1 : "ab"
+ 2 : "c"
+)
+
+𝝺> fromList [('a', Sum 1)] <> fromList [('a',Sum 2)] <> fromList [('b',Sum 3)]
+MMap (
+ 'a' : Sum {getSum = 3}
+ 'b' : Sum {getSum = 3}
+)
+```
+
+
+## Traversal: Foldables
+
+With `Monoid` we have successfully abstracted away the aggregation part of folding operations.
+Now we have to formalize how to traverse datastructures we want to fold.
+
+Let $M = \langle M, \diamond, u\rangle$ be a monoid, $f : A\rightarrow M$ a function that takes a
+values of type $A$ to a monoid, and `lst = [a1, ... , an]` a list of elements from $A$.
+The function `foldMap` of `lst` w.r.t. $M$ and $f$ is the composition of `map f` followed by the
+aggregation $\diamond$.
+
+![](lecture13/foldlist.png){class="inverting-image"}
+
+To make something `Foldable`, we only have to implement `foldMap`:
+```haskell
+instance Foldable [] where
+  foldMap f = mconcat . fmap f
+```
+And we will get a lot of functions for free (including `length`, `elem`, `maximum`,
+etc.)
+
+```haskell
+𝝺> :i Foldable
+type Foldable :: (* -> *) -> Constraint
+class Foldable t where
+  fold :: Monoid m => t m -> m
+  foldMap :: Monoid m => (a -> m) -> t a -> m
+  foldMap' :: Monoid m => (a -> m) -> t a -> m
+  foldr :: (a -> b -> b) -> b -> t a -> b
+  foldr' :: (a -> b -> b) -> b -> t a -> b
+  foldl :: (b -> a -> b) -> b -> t a -> b
+  foldl' :: (b -> a -> b) -> b -> t a -> b
+  foldr1 :: (a -> a -> a) -> t a -> a
+  foldl1 :: (a -> a -> a) -> t a -> a
+  toList :: t a -> [a]
+  null :: t a -> Bool
+  length :: t a -> Int
+  elem :: Eq a => a -> t a -> Bool
+  maximum :: Ord a => t a -> a
+  minimum :: Ord a => t a -> a
+  sum :: Num a => t a -> a
+  product :: Num a => t a -> a
+  {-# MINIMAL foldMap | foldr #-}
+        -- Defined in ‘Data.Foldable’
+instance Foldable (Either a) -- Defined in ‘Data.Foldable’
+instance Foldable [] -- Defined in ‘Data.Foldable’
+instance Foldable Maybe -- Defined in ‘Data.Foldable’
+instance Foldable Solo -- Defined in ‘Data.Foldable’
+instance Foldable ((,) a) -- Defined in ‘Data.Foldable’
+```
+
+::: details `foldr` in terms of `foldMap`
+For in depth information about `Foldable` implementations you can refer to the [Haskell
+Wiki](https://en.wikibooks.org/wiki/Haskell/Foldable). Most importantly, it shows how to implement
+`foldr` in terms of `foldMap` by exploiting the monoid of self-maps.
+:::
+
+
+For new types like `Tree a` we have to implement `foldMap` to inform Haskell about how to traverse
+it. For a tree we can define
+```haskell
+data Tree a = Leaf a | Node (Tree a) (Tree a)
+
+instance Foldable Tree where
+  foldMap :: Monoid m => (a -> m) -> Tree a -> m
+  foldMap f (Leaf x) = f x
+  foldMap f (Node l r) = foldMap f l <> foldMap f r
+
+tree :: Tree Int
+tree = Node (Leaf 7) (Node (Leaf 2) (Leaf 3))
+
+𝝺> foldMap Sum tree
+Sum {getSum = 12}
+```
+which immediately lets us fold any `Tree m` where `Monoid m => Tree m`.
+
+![](lecture13/fold.png){class="inverting-image"}
+
+### Example: `MMap` statistics
+
+For `MMap`s we already have a monoid instance, so let's use it to compute some statistics.
+With a simple `Count` monoid we can compute how many elements of a given value are in a list:
+```haskell
+instance Semigroup Count where
+  (Count n1) <> (Count n2) = Count (n1+n2)
+
+instance Monoid Count where
+  mempty = Count 0
+
+count :: a -> Count
+count _ = Count 1
+
+singleton :: k -> v -> MMap k v
+singleton k v = MMap (Map.singleton k v)
+
+𝝺> foldMap (\x -> singleton x (count x)) [1,2,3,3,2,4,5,5,5]
+MMap (
+ 1 : Count 1
+ 2 : Count 2
+ 3 : Count 2
+ 4 : Count 1
+ 5 : Count 3
+)
+```
+
+Perhaps more interestingly, we can use a product of monoids (i.e. a tuple of monoids) to compute
+statistics over the first letter of a list of words:
+```haskell
+ws = words $ map toLower "Size matters not. Look at me. Judge me by my size, do you? Hmm? Hmm. And well you should not. For my ally is the Force, and a powerful ally it is. Life creates it, makes it grow. Its energy surrounds us and binds us. Luminous beings are we, not this crude matter. You must feel the Force around you; here, between you, me, the tree, the rock, everywhere, yes. Even between the land and the ship."
+it :: [String]
+```
+
+We can define a function that collects a bunch of monoids which we want to fold over:
+```haskell
+stats :: Foldable t => t a -> (Count, Min Int, Max Int)
+stats word = (count word, Min $ length word, Max $ length word)
+
+𝝺> stats "size"
+(Count 1,Min 4,Max 4)
+```
+
+Each of the monoids above we want to again fold over `MMap`s with the first character as keys.
+Effectively `MMap` is very similar to grouping, hence the name `groupBy`:
+```haskell
+groupBy :: (Ord k, Monoid m) => (a -> k) -> (a -> m) -> (a -> MMap k m)
+groupBy keyf valuef a = singleton (keyf a) (valuef a)
+
+𝝺> groupBy head stats "size"
+MMap (
+ 's' : (Count 1,Min 4,Max 4)
+)
+```
+
+Finally we just have to call `foldMap` to accumulate all the `stats`.
+```haskell
+𝝺> foldMap (groupBy head stats) ws
+MMap (
+ 'a' : (Count 10, Min 1, Max  6)
+ 'b' : (Count  5, Min 2, Max  7)
+ 'c' : (Count  2, Min 5, Max  7)
+ 'd' : (Count  1, Min 2, Max  2)
+ ...
+ 'w' : (Count  2, Min 3, Max  4)
+ 'y' : (Count  6, Min 3, Max  4)
+)
+```
+
diff --git a/lectures/lecture13/fold.png b/lectures/lecture13/fold.png