diff --git a/10-pipes/pipes.md b/10-pipes/pipes.md new file mode 100644 index 0000000..2616aa3 --- /dev/null +++ b/10-pipes/pipes.md @@ -0,0 +1,246 @@ +% Pipes +% Paul Martinez +# Pipes 5/1/14 + +This guest lecture was given by Gabriel Gonzalez, creater of the Haskell library Pipes. + +Consider the following functions: + +~~~ {.haskell} +replicateM :: Monad m => Int -> ma -> m [a] +mapM :: monad m => (a -> m b) -> [a] -> m[b] +sequence :: Monad m => [m a] -> m [a] +~~~ + +These are three functions involving mapping over monads. A problem with this functions is that they don't return until everything has been processed, so you can't consume any results until everything has been processed. This is inefficient both time-wise and memory-wise and it also won't work for infinite lists. + +A potential solution is lazy IO, but this is disappointing for a number of reasons. It only works for the IO monad, and it only works for sources of information, not sinks or transformations. A major problem is that it invalidates the equational reasoning of Haskell programs because evaluation order may be important. It seems like an admission of defeat, declaring that monads are too difficult and awkward. + + +What we would to do is separate the production of values and the consumption of values. +Pipes is a co-routine library that tries tries to emulate this sort of paradigm in a manner +similar to Unix pipes. +~~~ {.haskell} +import Pipes +import System.IO (isEOF) + +-- Producer designates a generator of values +stdinLn :: Producer String IO () +stdinLn = do + eof <- lift isEOF + if eof + then return () + else do + str <- lift getLine + -- Special function yield hands off the value and blocks + -- until the value is used + yield str + stdinLn + +-- For every call to "yield str", a corresponding call to "useString str" is made +useString:: String -> Effect IO () +useString str = lift (putStrLn str) + +-- Echoes back string inputs from user +echo :: Effect IO () +echo = for stdinLn useString + +main :: IO () +main = runEffect echo +~~~ + + +How can we build something like this? We can think of the Producer type as +a sort of list containing effects inside. + +~~~ {.haskell} +import Control.Monad.Trans.Class (MonadTrans(lift)) + +data Producer a m r + = Yield a (Producer a m r) -- "Cons" of a list + | M (m (Producer a m r)) + | Return r -- Empty list + +yield :: a -> Producer a m () +yield a = Yield a (Return ()) + +instance Monad m => Monad (Producer a m) where +-- return :: Monad m => r -> Producer a m r + return r = Return r + +-- (>>=) :: Monad m +-- => Producer a m r -> (r -> Producer a m s) -> Producer a m s + (Yield a p) >>= return' = Yield a (p >>= return') + (M m) >>= return' = M (m >>= \p -> return (p >>= return')) + (Return r) >>= return' = return' r + +instance MonadTrans (Producer a) where +-- lift :: Monad m => m r -> Producer a m r + lift m = M (liftM Return m) +~~~ + + +Alternatively, the Producer type can be thought of as a syntax tree of `Yield` values +and a nil value. In this sense `for` connects to syntax trees to create a new one. + +~~~ {.haskell} +for :: Monad m + => Producer a m () + -> (a -> Producer b m ()) + -> Producer b m () +for (Yield a p) yield' = yield' a >> for p yield' +for (M m) yield' = M (m >>= \p -> return (for p yield')) +for (Return r) _ = Return r +~~~ + + +`runEffect` is a useful function for actually performing the actions generated by +Producer. An `Effect` is a `Producer Void`, where `Void` is a type with no constructors. +This means that an `Effect` has no yield constructors, so it contains an entirely +self-contained producer-consumer cycle. + + + +## Theory behind Pipes: + +A little bit about the theory behind Pipes: One of the cool things about Haskell +is that it uses design patterns that are inspired by category theory. We see these +in the typeclasses `Monoid`, `Applicative`, `Monad`, etc. We use these things because +we want to *reduce software complexity*. In software we have this problem where we hook +up a bunch of components together and the more components you have the more difficult +it is to keep track of everything. We can reduce the complexity if we make sure that +whenever we add a new component we still have the same type at the end, which is what a monoid is! + + +~~~ {.haskell} +class Monoid m where + mappend :: m -> m -> m + mempty :: m + +(<>) :: Monoid m => m -> m -> m +(<>) = mappend + +-- Monids must follow the following rules: +-- Associativity +(x <> y) <> z = x <> (y <> z) +-- Identity +mempty <> x = x +x <> mempty = x +~~~ + + + +We then see that a `Producer` can fit into this mold. +Returning unit is the equivalent of returning zero things while calling yield is +the equivalent of adding things. This is because `(>>)` and `return ()` within a Monad form a Monoid. + +~~~ {.haskell} +(>>) :: Producer a IO () -- (<>) :: m + -> Producer a IO () -- -> m + -> Producer a IO () -- -> m + +return () :: Producer a IO () -- mempty :: m +~~~ + + + +We can generalize monoids even further by discussing *categories*. + +~~~ {.haskell} +class Category cat where + (.) :: cat b c -> cat a b -> cat a c + id :: cat a a + + (>>>) :: cat a b -> cat b c -> cat a c + (>>>) = flip (.) +~~~ + + +In a monad `(>=>)` and `return` form a Category. + + +We will now define `~>` to be a point free oposition operator. We would like `(~>)` and `yield` to form a category. What this means in terms of following the appropriate laws can be found +on the ensuing slides. +~~~ {.haskell} +(f ~> g) x = for (f x) g +~~~ + + +## Pipes API + +In addition to having a producer that creates values, we can also create a consumer +that takes in values in a stateful manner. This example echoes back a user's input as before +but also prefixes it with a line number: + +~~~ {.haskell} +import Pipes +import Pipes.Prelude (stdinLn) + +numbered :: Int -> Consumer String IO r +numbered n = do + str <- await + let str' = show n ++ ": " ++ str + lift (putStrLn str') + numbered (n + 1) + +giveString :: Effect IO String +giveString = lift getLine + +nl :: Effect IO () +nl = giveString >~ numbered 0 + +main :: IO () +main = runEffect nl +~~~ + + + +The `Consumer` typeclass is defined similarly to `Producer`. + +~~~ {.haskell} +data Consumer a m r + = Await (a -> Consumer a m r ) + | M (m (Consumer a m r)) + | Return r + +await :: Consumer a m a +await = Await (\a -> Return a) +~~~ + +The `Consumer` equivalent of `Producer`'s for is the `(>~)`, the feed operator. +~~~ {.haskell} +(>~) :: Monad m + => Consumer a m b + -> Consumer b m c + -> Consumer a m c +~~~ + + +We can combine `Producer`s and `Consumer`s with the piper operator `(>->)`. + +~~~ {.haskell} +Mix Producers and Consumers using >-> +(>->) :: Producer a IO r + -> Consumer a IO r + -> Effect IO r + +main :: IO () +main = runEffect (stdinLn >-> numbered) +~~~ + + +In addition to mixing `Producer`s and `Consumer`s, we also have the `Pipe` type +which can both yield and await. In a way we can create `Consumer`s and `Producer`s from +Pipes simply by sealing off one end of the pipe: + +~~~ {.haskell} +type Consumer a = Pipe a Void +type Producer b = Pipe () b -- Almost, the real implementation is a bit more clever +~~~ + + + +The Pipes API inspired by category theory, equating `(>=>)` with `return`, +`(~>)` with `yield`, `(>~)` with `await`, and `(>->)` with `cat`. +A neat advantage or equating these is that the category laws then act as a small +test cases for the library.