Now that I'm comfortable with the simpler monads like Maybe
and List
, I challenged myself to redo a small coding challenge I saw a few month ago using haskell and the State
monad.
Constraints and objectives
The challenge boils down to make authenticated requests to an http api and collect the data which form a tree. The leaves of the tree ultimately form a string. There are a few tricks there, but the main one is that the auth token is valid only for 10 requests. This is an excellent exercice to practice the following:
- Make HTTP requests (with custom headers)
- Parse JSON response
- Keep track of a state between requests (the token)
Getting Started
For the cabal file:
build-depends: base >=4.8 && <4.9,
bytestring,
text,
vector,
mtl >= 2.2.1 && <3,
containers >= 0.5.6.3,
lens >= 4.5,
aeson >= 0.7.0.3,
lens-aeson,
wreq-sb >= 0.4 && < 0.5,
transformers
I went for http://www.serpentine.com/wreq/tutorial.html to make http requests. It's a simple and complete library. The other important dependency is the mtl
to get the State
monad.
The state monad
The star of the post. In Haskell, functions are pure which means they don't have side effects and their output is only a function of their input (no access to outside state/variables). Of course, sometimes, carrying a state between computation is required. Although this information could be embedded as an additional parameter, it quickly become cumbersome, and it makes composition of functions more difficult. As usual in Haskell, monads provide an elegant solution to this problem.
The state monad is usually introduced as:
newtype State s a = State { runState :: s -> (a, s) }
This was very confusing for me. The key is to see that it's just a wrapper around a function of type s -> (a, s)
. The next difficulty is how to construct such object. There are two methods. One which directly uses a constructor:
initialState :: State String a
initialState = state (\s -> compute s) -- or state compute
Another method is to use the do
notation:
initialState :: State String a
initialState = do
state <- get
let (result, nextState) = compute state
put nextState
return result
get
is provided by the State monad, and allow one to retrieve the internal state. compute
here is just a pure function (which needs to have type String -> (a, String)
). put
set the internal state and return
set the result of the computation.
From this monad, to extract the result, one has to use runState
:
λ: :type runState
runState :: State s a -> s -> (a, s)
So for example:
let (result, nextState) = runState initialState "foo"
Dealing with impure functions
In my case, the function to compute the next state involve making an http request, and this is impure. That's where monad transformers
come into play. They allow one monad to be executed in the context of another monad. In my case, I had to use StateT s IO a
which means a function with the signature s -> IO (a, s)
. This special monad can transform its state.
The code
First, some type aliases:
type Id = String
type Result = String
type Session = (Int, BI.ByteString)
type CrawlState = ([Id], Session)
The core function embedded into the state monad:
queryApi :: ([Id], Session) -> IO (Maybe Result, ([Id], Session))
queryApi ([], session) = return (Nothing, ([], session))
queryApi (currentId:nextIds, session) = do
(body, newSess) <- liftIO $ fetchNext currentId session
let nextTargets = extractIds body
let res = extractResult body
return (res, (nextTargets ++ nextIds, newSess))
Note the liftIO
here, which means the given action has to be executed in the IO context. This is valid here because the transformer is IO
.
The next step is to use this function into a State monad and pass it around until I got all the info from the api. This is the gather
function:
gather :: StateT CrawlState IO (Maybe Result) -> CrawlState
-> IO ([Maybe Result], CrawlState)
gather mState s = do
(res, newState@(ids, _)) <- runStateT mState s
case ids of
[] -> return ([res], newState)
_ -> do
(nextResults, nextState) <- gather mState newState
return (res : nextResults, nextState)
runStateT
is the equivalent of runState
and allow to retrieve the result and the new state from the compution embedded inside the monad. Then, depending on the new state (is there more stuff to fetch?), I'm either done or I recurse with the new updated state.
The main
function is very simple:
main :: IO ()
main = do
initialSession <- newSession
let initialState = (["start"], initialSession)
(results, _) <- gather (S.StateT queryApi) initialState
putStrLn $ show $ concat . catMaybes $ results
The full code is on github.
Conclusion
This was an interesting exercise, although I'm pretty sure it would be easier to just pass the state as function argument here. This was also a neat opportunity to do network requests, as a web dev, it's my bread and butter and it feels more practical than most abstract exercises from books I've read.
This post also uses the StateT
monad (combined with the list monad) in a very clever way, and is well worth the read.