abstract

Ben will talk about building unix command line tools in Haskell. He will talk about some of the standards and traditions that commandline tools should follow, and a handful of Haskell libraries that help make that happen - including command line option parsing, pretty colours, and interacting nicely with other tools in a build chain.

Commandline Tools in Haskell

Ben Clifford

benc@hawaga.org.uk


module Main where

main :: IO ()
main = putStrLn "hello"

$ main
hello

The unix philosophy

Someone (Doug McIlroy / Peter H. Salus?):

Write programs that do one thing and do it well.
Write programs to work together.
Write programs to handle text streams, because that is a universal interface.

* I'm going to talk about libraries to do this things, and a little bit
of the unix philosophy that goes with each concept as we go along.

Handling text streams


module Main where

main :: IO ()
main = do
  t <- getContents
  putStr t
  putStr t
  putStrLn $ "There were "
          ++ (show . length) t
          ++ " characters"

$ duplicate
hello
hello
world
world
CTRl-D
hello
world
There were 12 characters

Let's start with the stringly typed awfulness that is handling text streams.

global implicit input stream called `stdin` - getContents reads the whole of that into t

global implicit output stream called `stdout` - putStr and putStrLn send stuff to that stream

There are a couple of awkward things here. ==== OLDER NOTES FOR THIS SECTION ==== TODO: getContents is something simple. probably want to talk about forms of IO that are a bit more nuamced in their laziness? Give an example of some laziness problem with getcontents?

* stdin, stdout and stderr - especially stdout vs stderr - "output" goes on stdout, "errors" go on stderr - some decent rule of thumb? like if you were piping it into another program, what data would you want to go into that other program, and what data would you like to go to a human reading the console/ logs. - how do I output a password prompt? (for example...) is that stderr or do I wire into something else? - can i change behaviour based on if stdin is a tty? (that's a convention, but I'm not sure where its documented and if I can easily do that in Haskell) - pipes: stdin/out don't just go to/from the terminal pseudo-Haskell: stream -> stream composition of parallel processes, with a stderr side stream, and exit/error handling that aborts the lot though a program can also do "anything" / "mutate the world" so emphasis is not on stopping "mutate the world" but on interacting with the world as others expect.

stderr (I)

$ duplicate > out.txt
hello
world
CTRL-D

stderr (2)


module Main where

import System.IO (hPutStrLn, stderr)

main :: IO ()
main = do
  t <- getContents
  putStr t
  putStr t
  hPutStrLn stderr $ "There were "
                  ++ (show . length) t
                  ++ " characters"

$ duplicate > out.txt
hello
world
CTRL-D
There were 12 characters

Streaming laziness


module Main where

main :: IO ()
main = do
  t <- getContents
  putStr t
  putStr t
  putStrLn $ "There were "
          ++ (show . length) t
          ++ " characters"

$ duplicate
hello
hello
world
world
CTRl-D
hello
world
There were 12 characters

Let's go back to the original duplication program and look at the output. After I type in a line by hand, it gets output to stdout. But then I can type in more input, and that gets output too. So if I'm at that point, where in the main program am I actually? getContents doesn't actually get the whole contents of stdin. Instead it returns some lazy structure that gets read in as needed - which in this program is the first putStr. (it would also be forced to be read in entirely by the second putStr, or by `length`)

This laziness can be pretty awkward/hard to reason about: for example if there is an error when reading the input (for example a unicode decoding error that error won't actually happen until later in the program (at putStrLn).

So actually for anything more serious, something that involves character by character or line by line strict IO can work better: for example, using getLine and/or streaming libraries like `pipes` which are designed to handle this situation. (but I don't have an example of in this talk)

Exit codes


data ExitCode
  = ExitSuccess
  | ExitFailure Int

exitWith :: ExitCode -> IO a

Let's have a look at the command line equivalent of exception handling.

When a tool runs, it can indicate to the OS whether it ran successfully or whether it failed in some way.

One common place where that is used is in a build/CI system: it is how most build systems know that a compiler or a test suite has failed.

In Haskell, if you reach the end of `main`, that is implicitly a successful exit; and if some Haskell-level error such as an exception or explicit error call happens, that's a failure.

But you can also use `exitWith` to explictly exit the process at an arbitrary point in an IO action.

This is one thing that I have frustratingly found several times in new-ish tooling: when a process fails, it outputs a nice error message for the human, but doesn't exit with a suitable exit code - so when that tool is used in a build/CI pipeline, that larger system doesn't know there is a problem.

===== PREVIOUS NOTES ==== * exit codes When a process exits, it returns a one byte exit code: - how a tool signals an error/exception to the OS: 0 means success other values mean failure - the shitty way of `error "FOO"` gives an acceptable way of doing exit codes. be careful if you exit for other reasons to exit with failure (something i've seen in a bunch of immature tooling - docker, purescript compiler for example in years past) - make sure that in addition to printing your error message you also exit with a failure code - importance of exit codes for build pipelines (for example) - eg `Make` or travis CI give example of something (eg travis) integrated to github - that red cross comes from the exit code - and if your compiler or test suite doesn't exit that way, then the tests will come up green even though they've failed. - kinda feels like a reverse-Maybe: you can return an opaque success, or many different exit values - generally, you would use those different values to represent different kinds of error that an automated caller might like to distinguish between. or just exit with `1` if you don't have anything more interesting. There's some posixy description of what some of the higher error codes represent - somewhere? Haskell modules: System.Exit exitWith ExitSuccess :: IO _ exitWith (ExitFailure 1) :: IO _

Exit codes (bad)


module Main where

import System.IO (hPutStrLn, stderr)

main :: IO ()
main = do
  t <- getContents
  if length t == 0
  then hPutStrLn stderr "ERROR: no lines provided"
  else process t

process _ = pure ()

ERROR: no lines provided
$ echo $?
0   # success!

Exit codes (good)


module Main where

import System.IO (hPutStrLn, stderr)

main :: IO ()
main = do
  t <- getContents
  if length t == 0
  then error "ERROR: no lines provided"
  else process t

process _ = pure ()

exitbad-exe: ERROR: no lines provided
CallStack (from HasCallStack):
  error, called at app/Main.hs:9:8 in main:Main
$ echo $?
1

Exit codes (good, 2)


module Main where

import System.IO (hPutStrLn, stderr)
import System.Exit (exitWith, ExitCode (ExitFailure))

main :: IO ()
main = do
  t <- getContents
  if length t == 0
  then do hPutStrLn stderr "ERROR: no lines provided"
          exitWith (ExitFailure 3)
  else process t

process _ = pure ()

ERROR: no lines provided
$ echo $?
3

The environment

TERM=xterm
HOME=/home/benc
XDG_RUNTIME_DIR=/run/user/1000
LOGNAME=benc

Unix processes also a set of key/value pairs that are dynamically scoped: when a process calls another process, that new process inherits all the key/value pairs of the parent process.

Dynamic scoping is something that modern programming languages have mostly decided is a bad thing - but it can be useful in some cases in unix

Here are some examples that are set in a shell on my laptop by default: giving the type of terminal I'm using, the path to my home directory (for example, if a program wants to store stuff there), and a runtime directory where programs can store temporary files for me.

* the environment
  - dynamic scope. string->string mapping. imagine it as a Reader.

* a classic one is $TMPDIR  - where do we store temporary files? I want
to set it for a work session, for example, and have every program in that
session use it, no matter how deep in the process call stack.

c.f. implicit arguments in Haskell, where the value of an argument passed
to a function propagates down to any called function which also has that
implicit argument.

import System.Environment
getEnvironment :: IO [(String, String)]

Reading the environment


import System.Environment
getEnvironment :: IO [(String, String)]
getEnv :: String -> IO String

Running other processes


data CreateProcess = CreateProcess {
  cmdspec      :: CmdSpec,      cwd     :: Maybe FilePath,
  env          :: Maybe [(String,String)],
  std_in       :: StdStream,
  std_out      :: StdStream,    std_err :: StdStream,
  close_fds    :: Bool,
  create_group :: Bool,         delegate_ctlc:: Bool,
  detach_console :: Bool,       create_new_console :: Bool,
  new_session :: Bool,
  child_group :: Maybe GroupID, child_user :: Maybe UserID,
  use_process_jobs :: Bool      } deriving (Show, Eq)

createProcess :: CreateProcess -> IO (Maybe Handle,
                 Maybe Handle, Maybe Handle, ProcessHandle)

Sometimes a program will want to run other programs to do its work - for example, `git` can spawn the `ssh` command to pull/push commits to/from remote systems, or your usual text editor for editing commit messages.

There are a few libraries around to do this, and I don't really have a favourite - they all seem a bit more awkward than I really want.

I'll talk about the process package.

Generally you want to run another command - another unix process - and either have its input/output go to the user (for example, if you're spawning a text editor) or have the input/output connected back to the calling process - if the calling program wants to interact with the newly launched program.

At its core is this big data type - CreateProcess - which describes quite a few different ways in which a process launch could be parameterised - and an IO action which launches the process, potentially giving us access to the process' std streams and process ID.

That's pretty horrible - so the process package provides a pile of wrappers for common use cases.

* running other processes
  - many ways / libraries
  - esp interesting is capture of stdout/stderr in various ways vs passing it through

import System.Process

a variety of different things, from the simple:

callCommand :: String -> IO ()

to

createProcess :: CreateProcess -> IO (Maybe Handle, Maybe Handle, Maybe Handle, ProcessHandle) 

That more elaborate CreateProcess structure lets you specify
a lot of things, such as a different environment (which would otherwise
be inherited), and what to do with the std streams.


module Main where

import System.Process
import System.IO (hPutStrLn, stderr)

main :: IO ()
main = do
  let filename = "/tmp/somefile"
  hPutStrLn stderr "Launching editor"
  callProcess "nano" [filename]
  t <- readFile filename
  hPutStrLn stderr ("Editor returned, with contents: " ++ t)

demo?


module Main where

import System.Process
import System.IO (hPutStrLn, stderr)

main = do
  let a = 5
  let b = 3
  let input = show a ++ " + " ++ show b ++ "\n"
  output <- readProcess "bc" [] input
  let r = read output 
  putStrLn $ "Result is " ++ show r
  if a + b == r then putStrLn "correct" else error "wrong"

$ adder
Result is 8
correct

Console colours

$ ls
ChangeLog.md  README.md  app             package.yaml  stack.yaml
LICENSE       Setup.hs   optparse.cabal  src           test


/home/benc/app/Main.hs:31:22: error:
    Variable not in scope: isPrefixOf :: String -> [Char] -> Bool
   |
31 |   return $ filter (p `isPrefixOf`) ["1", "8", "256"]
   |                      ^^^^^^^^^^^^

We can make the output of a program appear in different colours. That happens in real life for example in `ls` - different types of entry have different colours - for example, directories are blue. Or in GHC output, where syntax errors are highlighted in red.

Generally in a terminal these use inline escape codes, called ANSI escape codes, that date back to around 1978 - just a bit older than me

If you were a DOS user in the 1980s and loaded ANSI.SYS - that is the same codes.

Or if you were a BBS user in the 1990s and that BBS did colour, that was probably also the same ANSI escape codes.

ANSI escape codes let you do things like colours - they also let you do other things like move the cursor around, or setting the window title.

* console colours - this is the future after all. using the same control codes as used in 1980s/1990s era BBSes. and so your programs can look a bit like an 1990s BBS too!
  - explain control codes basically (rather than someout of band signalling
    like more modern graphics)
  - more elaborate, but I'm not going to go into this, is how you'd do a "full screen" or rather full-window application.

- screenshot: diff from `git`

import System.Console.ANSI
setSGR [Reset, SetColor Foreground Dull Yellow]
putStrLn "hello"
setSGR [Reset]

^ screenshot of this

hSupportsANSI IO.stdout :: IO Bool

- ^ we want to be able to ask this because if we're feeding into a pipe,
it is conventional to not send colour codes: colours are for the terminal,
not for the next program in the pipe to consume.


Can also use ANSI codes to deal with Cursor positioning and request
things like console size - maybe want to truncate lines rather than
have them wrap, or configure your pretty printer based on that width.


- these are the same code sequences used in travis CI. and BBSes. and MS-DOS ANSI.SYS
  and vt100 terminals
    - maybe for fun include a BBS screen shot
    - or a pic of a vt100

- TODO: describe how these things output specific magic byte sequences to the console - they are "in-band" in that
  way and the library is just emitting those for you


module Main where
import System.Console.ANSI -- from ansi-terminal package
import System.IO (stdout)
import Control.Monad (when)

main = do
  c <- hSupportsANSI stdout
  when c $ setSGR [Reset, SetColor Foreground Dull Yellow]
  putStr "hello "
  when c $ setSGR [Reset, SetColor Background Dull Cyan,
                          SetColor Foreground Dull Black]
  putStr "world"
  when c $ setSGR [Reset]
  putStrLn "."

$ ansi
hello world.

$ ansi
hello world.

$ ansi | less
hello world.

$ bad-ansi | less
ESC[0;33mhello ESC[0;46;30mworldESC[0m.

Command Line Options

$ ls
ChangeLog.md  README.md  app             package.yaml  stack.yaml
LICENSE       Setup.hs   optparse.cabal  src           test

$ ls --sort=size --format=long
total 40
drwxr-xr-x 2 benc benc 4096 May 29 07:04 app
drwxr-xr-x 2 benc benc 4096 May 29 07:04 src
drwxr-xr-x 2 benc benc 4096 May 29 07:04 test
-rw-r--r-- 1 benc benc 2134 May 29 07:04 stack.yaml
-rw-r--r-- 1 benc benc 1529 May 29 07:04 LICENSE
-rw-r--r-- 1 benc benc 1500 May 29 07:04 optparse.cabal
-rw-r--r-- 1 benc benc 1177 May 29 07:04 package.yaml
-rw-r--r-- 1 benc benc   48 May 29 07:04 ChangeLog.md
-rw-r--r-- 1 benc benc   46 May 29 07:04 Setup.hs
-rw-r--r-- 1 benc benc   11 May 29 07:04 README.md

One of the big things about command line tools is that you control how they work by putting options on the command line - for example, ls lists files in a directory, but I can configure more specifically how it does that listing with options.

=====
- this is the big one
* commandline parsing
  - the shit way - getArgs. gives an array of strings. ok if you want to
    really pass in just one or two mandatory arguments.
    ... but that's not how many tools interfaces should work.
    ... give examples of "ls -t ~" or "git commit -a -m hello" with a 
             subcommand structure

  - I'm a big fan of writing parsers in Haskell. optparse-applicative
    is a parser for parsing command line options. so it has a different
    feel. the raw tokens are individual strings, as comes from getArgs,
   rather than characters. and there are some features which capture
   common patterns in command line parsing that a more general parser
   might not have.


import System.Environment

getArgs :: IO [String]

["--sort=size", "--format=long"]

optparse-applicative: defintion


module Main where
import Options.Applicative

data Config = Config { verbose :: Bool }

configOpts :: Parser Config
configOpts = Config <$> switch (long "verbose" <> short 'v')

main = do config <- execParser (info configOpts mempty)
       runWith config

runWith :: Config -> IO ()
runWith c = putStrLn $ if verbose c then "Verbose output"
                                    else "ssssh"

So parsers were my original introduction to Haskell, and my first talk here was about parsec. Here's another parser, optparse-applicative, that is specialised for handling command-line parameters, rather than arbitrary strings. It has a lot of functionality, a lot of which I'm going to ignore here. Here's the basic structure: we're going to parse our command line options into a data structure which I've called Config. The only configuration at the moment is a boolean called "verbose", which determines if our program will be noisy or not. Down here at the bottom, I can run a program with a config - if its verbose, it prints one thing. if it's not verbose, it prints sssh. And in the middle we're going to use optparse-applicative to generate a Config object from the commandline: Basically we can use applicative syntax (actually just functor syntax) to say the first (and only) field of Config is a switch which can be specified in two ways: either a long verbose name or a short v name. I'll show what those invocations look like on the next slide. If you've used applicative parsers before, this might be a familiar way of constructing objects: you use applicative notation to put parsers in for each value. Then in main, there's some boilerplate that runs that parser and returns the config.

optparse-applicative: use

$ prog
ssssh

$ prog -v
Verbose output

$ prog --verbose
Verbose output

$ prog --debug
Invalid option `--debug'

Usage: prog [-v|--verbose]

optparse-applicative: help


data Config = Config { verbose :: Bool, count :: Int }

configOpts :: Parser Config
configOpts = Config 
 <$> switch ( long "verbose" <> short 'v'
           <> help "Enable verbose output")
 <*> option auto ( long "count" <> help "How many?"
                <> metavar "PIES")

main = do
  config <- execParser (info (configOpts <**> helper)
                             (header "Opts example"))
  runWith config

Providing decent help is one of the things you'll probably neglect if you're casually writing your own commandline option handling. We've already seen that optparse-applicative does more than you would probably do. But there's more. I've added in a second option, a count of how many pies you want, to make it a bit more interesting. We use what someone's referred to as the goatse operator to attach the parser for this second config value, an integer - with only a long description, and two separate bits of help text. In main, I've added a second option parser, supplied by the library - this one will provide a --help option, and a title for the program to appear in the help text... and on the next slide we can see what that help text looks like. It looks like a real command line tool help!

optparse-applicative: help usage

$ prog --help
Opts example

Usage: prog [-v|--verbose] --count PIES

Available options:
  -v,--verbose             Enable verbose output
  --count PIES             How many?
  -h,--help                Show this help text

tab completion

$ ls
ChangeLog.md  README.md  app             package.yaml  stack.yaml
LICENSE       Setup.hs   optparse.cabal  src           test
$ cat opt<TAB>parse.cabal

$ git log -r <TAB><TAB>
HEAD     master   
$ git log -r mas<TAB>ster


$ source <(prog --bash-completion-script `which prog`)

$ prog <TAB><TAB>-
--count    --help     --verbose  -h         -v         
$ prog --ver<TAB>bose

Custom tab completion


data Config = Config { verbose :: Bool, count :: Int }

configOpts :: Parser Config
configOpts = Config
 <$> switch ( long "verbose" <> short 'v'
           <> help "Enable verbose output")
 <*> option auto ( long "count" <> help "How many?"
                                <> metavar "PIES"
                                <> completer myCompleter)

myCompleter :: Completer
myCompleter = mkCompleter $ \p -> do
  hPutStr stderr "[IN COMPLETER]"
  return $ filter (p `isPrefixOf`) ["1", "8", "256"]

Summary

Text streams: stdin, stdout, stderr
Exit codes: reporting success or failure
The environment
Running other processes
Console colours and ANSI fun
Command Line Options