Stateful computation in Haskell

Now that I'm comfortable with the simpler monads like Maybe and List, I challenged myself to redo a small coding challenge I saw a few month ago using haskell and the State monad.

Constraints and objectives

The challenge boils down to make authenticated requests to an http api and collect the data which form a tree. The leaves of the tree ultimately form a string. There are a few tricks there, but the main one is that the auth token is valid only for 10 requests. This is an excellent exercice to practice the following:

Getting Started

For the cabal file:

  build-depends:       base >=4.8 && <4.9,
                       bytestring,
                       text,
                       vector,
                       mtl >= 2.2.1 && <3,
                       containers >= 0.5.6.3,
                       lens >= 4.5,
                       aeson >= 0.7.0.3,
                       lens-aeson,
                       wreq-sb >= 0.4 && < 0.5,
                       transformers

I went for http://www.serpentine.com/wreq/tutorial.html to make http requests. It's a simple and complete library. The other important dependency is the mtl to get the State monad.

The state monad

The star of the post. In Haskell, functions are pure which means they don't have side effects and their output is only a function of their input (no access to outside state/variables). Of course, sometimes, carrying a state between computation is required. Although this information could be embedded as an additional parameter, it quickly become cumbersome, and it makes composition of functions more difficult. As usual in Haskell, monads provide an elegant solution to this problem.

The state monad is usually introduced as:

newtype State s a = State { runState :: s -> (a, s) }

This was very confusing for me. The key is to see that it's just a wrapper around a function of type s -> (a, s). The next difficulty is how to construct such object. There are two methods. One which directly uses a constructor:

initialState :: State String a
initialState = state (\s -> compute s) -- or state compute

Another method is to use the do notation:

initialState :: State String a
initialState = do
  state <- get
  let (result, nextState) = compute state
  put nextState
  return result

get is provided by the State monad, and allow one to retrieve the internal state. compute here is just a pure function (which needs to have type String -> (a, String)). put set the internal state and return set the result of the computation.

From this monad, to extract the result, one has to use runState:

λ: :type runState
runState :: State s a -> s -> (a, s)

So for example:

let (result, nextState) = runState initialState "foo"

Dealing with impure functions

In my case, the function to compute the next state involve making an http request, and this is impure. That's where monad transformers come into play. They allow one monad to be executed in the context of another monad. In my case, I had to use StateT s IO a which means a function with the signature s -> IO (a, s). This special monad can transform its state.

The code

First, some type aliases:

type Id = String
type Result = String
type Session = (Int, BI.ByteString)
type CrawlState = ([Id], Session)

The core function embedded into the state monad:

queryApi :: ([Id], Session) -> IO (Maybe Result, ([Id], Session))
queryApi ([], session) = return (Nothing, ([], session))
queryApi (currentId:nextIds, session) = do
  (body, newSess) <- liftIO $ fetchNext currentId session
  let nextTargets = extractIds body
  let res = extractResult body
  return (res, (nextTargets ++ nextIds, newSess))

Note the liftIO here, which means the given action has to be executed in the IO context. This is valid here because the transformer is IO.

The next step is to use this function into a State monad and pass it around until I got all the info from the api. This is the gather function:

gather :: StateT CrawlState IO (Maybe Result) -> CrawlState
  -> IO ([Maybe Result], CrawlState)
gather mState s = do
  (res, newState@(ids, _)) <- runStateT mState s
  case ids of
    [] -> return ([res], newState)
    _ -> do
      (nextResults, nextState) <- gather mState newState
      return (res : nextResults, nextState)

runStateT is the equivalent of runState and allow to retrieve the result and the new state from the compution embedded inside the monad. Then, depending on the new state (is there more stuff to fetch?), I'm either done or I recurse with the new updated state.

The main function is very simple:

main :: IO ()
main = do
  initialSession <- newSession
  let initialState = (["start"], initialSession)
  (results, _) <- gather (S.StateT queryApi) initialState
  putStrLn $ show $ concat . catMaybes $ results

The full code is on github.

Conclusion

This was an interesting exercise, although I'm pretty sure it would be easier to just pass the state as function argument here. This was also a neat opportunity to do network requests, as a web dev, it's my bread and butter and it feels more practical than most abstract exercises from books I've read.

This post also uses the StateT monad (combined with the list monad) in a very clever way, and is well worth the read.