abstract
Ben will talk about building unix command line tools in Haskell. He will talk
about some of the standards and traditions that commandline tools should
follow, and a handful of Haskell libraries that help make that happen - including
command line option parsing, pretty colours, and interacting nicely
with other tools in a build chain.
Commandline Tools in Haskell
Ben Clifford
benc@hawaga.org.uk
module Main where
main :: IO ()
main = putStrLn "hello"
$ main
hello
I'd like to talk about building command line tools in Haskell. Almost any
program that you compile with GHC turns into something you can run from
the command line...
Here's a simple program being run from the command line.
You can also launch applications like firefox,
chrome, libreoffice, HMRC Payroll Tools from the command line. But I
don't mean them.
I mean something a bit more specific.
The unix philosophy
Someone (Doug McIlroy / Peter H. Salus?):
Write programs that do one thing and do it well.
Write programs to work together.
Write programs to handle text streams, because that is a universal interface.
From the 1970s comes the unix philosophy, summarised here.
As Haskellers: like first two points (composition), and hate third point (stringly typed)
That's part of what a modern command line tool should probably do or be - not everything,
but it's a good start. I'll introduce some other ideas as we go.
* I'm going to talk about libraries to do this things, and a little bit
of the unix philosophy that goes with each concept as we go along.
Handling text streams
module Main where
main :: IO ()
main = do
t <- getContents
putStr t
putStr t
putStrLn $ "There were "
++ (show . length) t
++ " characters"
$ duplicate
hello
hello
world
world
CTRl-D
hello
world
There were 12 characters
Let's start with the stringly typed awfulness that is handling text streams.
global implicit input stream called `stdin` - getContents reads the whole of
that into t
global implicit output stream called `stdout` - putStr and putStrLn
send stuff to that stream
There are a couple of awkward things here.
==== OLDER NOTES FOR THIS SECTION ====
TODO: getContents is something simple.
probably want to talk about forms of IO that are a bit more nuamced in their
laziness? Give an example of some laziness problem with getcontents?
* stdin, stdout and stderr - especially stdout vs stderr
- "output" goes on stdout, "errors" go on stderr - some decent rule
of thumb? like if you were piping it into another program,
what data would you want to go into that other program, and
what data would you like to go to a human reading the console/
logs.
- how do I output a password prompt? (for example...) is that stderr or do I wire into something else?
- can i change behaviour based on if stdin is a tty? (that's a convention, but I'm not sure where its documented and if I can easily do that in Haskell)
- pipes: stdin/out don't just go to/from the terminal
pseudo-Haskell: stream -> stream composition of parallel
processes, with a stderr
side stream, and exit/error handling that aborts the lot
though a program can also do "anything" / "mutate the world"
so emphasis is not on stopping "mutate the world" but on
interacting with the world as others expect.
stderr (I)
$ duplicate > out.txt
hello
world
CTRL-D
The first awkward thing happens when we try to do something more interesting
with the output stream. Here I redirect the output into a file called
out.txt using this arrow symbol.
It might be that we wanted the user to see that final line of summary
information, rather than it being directed into the output file.
Often that is useful for error messages, so we get a second output stream,
called `stderr` which doesn't get redirected.
stderr (2)
module Main where
import System.IO (hPutStrLn, stderr)
main :: IO ()
main = do
t <- getContents
putStr t
putStr t
hPutStrLn stderr $ "There were "
++ (show . length) t
++ " characters"
$ duplicate > out.txt
hello
world
CTRL-D
There were 12 characters
We can use hPutStrLn to write to a specific file handle, the stderr
one, which is open by default at the start of execution.
Now our duplicated content, on stdout, goes to out.txt as directed,
and our summary information goes to stderr which hasn't been redirected.
There's some vague rule of thumb something like: information for the
human watching a bunch of processes work together should go to
stderr, and information to go to other processes should go to stdout.
Streaming laziness
module Main where
main :: IO ()
main = do
t <- getContents
putStr t
putStr t
putStrLn $ "There were "
++ (show . length) t
++ " characters"
$ duplicate
hello
hello
world
world
CTRl-D
hello
world
There were 12 characters
Let's go back to the original duplication program and look at the output.
After I type in a line by hand, it gets output to stdout. But then I can
type in more input, and that gets output too. So if I'm at that point,
where in the main program am I actually? getContents doesn't actually get
the whole contents of stdin. Instead it returns some lazy structure that
gets read in as needed - which in this program is the first putStr.
(it would also be forced to be read in entirely by the second putStr, or
by `length`)
This laziness can be pretty awkward/hard to reason about: for
example if there is an error
when reading the input (for example a unicode decoding error
that error
won't actually happen until later in the program (at putStrLn).
So actually for anything more serious, something that involves character
by character or line by line strict IO can work better: for example,
using getLine and/or streaming libraries like `pipes` which are designed
to handle this situation.
(but I don't have an example of in this talk)
Exit codes
data ExitCode
= ExitSuccess
| ExitFailure Int
exitWith :: ExitCode -> IO a
Let's have a look at the command line equivalent of exception handling.
When a tool runs, it can indicate to the OS whether it ran successfully
or whether it failed in some way.
One common place where that is used is in a build/CI system: it is how most
build systems know that a compiler or a test suite has failed.
In Haskell, if you reach the end of `main`, that is implicitly a
successful exit; and if some Haskell-level error such as an exception
or explicit error call happens, that's a failure.
But you can also use `exitWith` to explictly exit the process at an
arbitrary point in an IO action.
This is one thing that I have frustratingly found several times in
new-ish tooling: when a process fails, it outputs a nice error message
for the human, but doesn't exit with a suitable exit code - so when that
tool is used in a build/CI pipeline, that larger system doesn't know there
is a problem.
===== PREVIOUS NOTES ====
* exit codes
When a process exits, it returns a one byte exit code:
- how a tool signals an error/exception to the OS:
0 means success
other values mean failure
- the shitty way of `error "FOO"` gives an acceptable way of doing exit codes. be careful if you
exit for other reasons to exit with failure (something i've seen in a bunch of immature tooling -
docker, purescript compiler for example in years past) - make sure that in addition to printing your error message you also exit with a failure code
- importance of exit codes for build pipelines (for example)
- eg `Make` or travis CI
give example of something (eg travis) integrated to github - that red cross comes from the exit code - and if your compiler or test suite doesn't
exit that way, then the tests will come up green even though they've
failed.
- kinda feels like a reverse-Maybe: you can return an opaque success,
or many different exit values - generally, you would use those different
values to represent different kinds of error that an automated caller
might like to distinguish between. or just exit with `1` if you don't
have anything more interesting.
There's some posixy description of what some of the higher error codes
represent - somewhere?
Haskell modules:
System.Exit
exitWith ExitSuccess :: IO _
exitWith (ExitFailure 1) :: IO _
Exit codes (bad)
module Main where
import System.IO (hPutStrLn, stderr)
main :: IO ()
main = do
t <- getContents
if length t == 0
then hPutStrLn stderr "ERROR: no lines provided"
else process t
process _ = pure ()
ERROR: no lines provided
$ echo $?
0 # success!
Here's an example of failing to exit with the right exit code.
Exit codes (good)
module Main where
import System.IO (hPutStrLn, stderr)
main :: IO ()
main = do
t <- getContents
if length t == 0
then error "ERROR: no lines provided"
else process t
process _ = pure ()
exitbad-exe: ERROR: no lines provided
CallStack (from HasCallStack):
error, called at app/Main.hs:9:8 in main:Main
$ echo $?
1
If we take a lazier approach of just calling `error`, we get the supplied
message printed to stderr, and a failure exit code - the right behaviour
but there's a bit of extra mess in there which might or might not be
desired...
Exit codes (good, 2)
module Main where
import System.IO (hPutStrLn, stderr)
import System.Exit (exitWith, ExitCode (ExitFailure))
main :: IO ()
main = do
t <- getContents
if length t == 0
then do hPutStrLn stderr "ERROR: no lines provided"
exitWith (ExitFailure 3)
else process t
process _ = pure ()
ERROR: no lines provided
$ echo $?
3
... or we can use exitWith explicitly for more control of the output -
for example, we can choose the exit code number to give some more
nuanced numeric description to why we have failed.
The environment
TERM=xterm
HOME=/home/benc
XDG_RUNTIME_DIR=/run/user/1000
LOGNAME=benc
Unix processes also a set of key/value pairs that are dynamically
scoped: when a process calls another process, that new process inherits
all the key/value pairs of the parent process.
Dynamic scoping is something that modern programming languages have mostly
decided is a bad thing - but it can be useful in some cases in unix
Here are some examples that are set in a shell on my laptop by default:
giving the type of terminal I'm using, the path to my home directory
(for example, if a program wants to store stuff there), and
a runtime directory where programs can store temporary files for me.
* the environment
- dynamic scope. string->string mapping. imagine it as a Reader.
* a classic one is $TMPDIR - where do we store temporary files? I want
to set it for a work session, for example, and have every program in that
session use it, no matter how deep in the process call stack.
c.f. implicit arguments in Haskell, where the value of an argument passed
to a function propagates down to any called function which also has that
implicit argument.
import System.Environment
getEnvironment :: IO [(String, String)]
Reading the environment
import System.Environment
getEnvironment :: IO [(String, String)]
getEnv :: String -> IO String
In Haskell, the environment is exposed by various functions in
System.Environment - getEnvironment returns them all as key/values
in a list; getEnv returns a single named environment.
TODO: need a code example of using this: for example reading LOGNAME
and using it to default some username setting somewhere (like in irssi)
Running other processes
data CreateProcess = CreateProcess {
cmdspec :: CmdSpec, cwd :: Maybe FilePath,
env :: Maybe [(String,String)],
std_in :: StdStream,
std_out :: StdStream, std_err :: StdStream,
close_fds :: Bool,
create_group :: Bool, delegate_ctlc:: Bool,
detach_console :: Bool, create_new_console :: Bool,
new_session :: Bool,
child_group :: Maybe GroupID, child_user :: Maybe UserID,
use_process_jobs :: Bool } deriving (Show, Eq)
createProcess :: CreateProcess -> IO (Maybe Handle,
Maybe Handle, Maybe Handle, ProcessHandle)
Sometimes a program will want to run other programs to do its work -
for example, `git` can spawn the `ssh` command to pull/push commits
to/from remote systems, or your usual text editor for editing commit
messages.
There are a few libraries around to do this, and I don't really have
a favourite - they all seem a bit more awkward than I really want.
I'll talk about the process package.
Generally you want to run another command - another unix process -
and either have its input/output go to the user (for example, if you're
spawning a text editor) or have the input/output connected back to
the calling process - if the calling program wants to interact with the
newly launched program.
At its core is this big data type - CreateProcess - which describes
quite a few different ways in which a process launch could be parameterised -
and an IO action which launches the process, potentially giving us access to
the process' std streams and process ID.
That's pretty horrible - so the process package provides a pile of
wrappers for common use cases.
* running other processes
- many ways / libraries
- esp interesting is capture of stdout/stderr in various ways vs passing it through
import System.Process
a variety of different things, from the simple:
callCommand :: String -> IO ()
to
createProcess :: CreateProcess -> IO (Maybe Handle, Maybe Handle, Maybe Handle, ProcessHandle)
That more elaborate CreateProcess structure lets you specify
a lot of things, such as a different environment (which would otherwise
be inherited), and what to do with the std streams.
module Main where
import System.Process
import System.IO (hPutStrLn, stderr)
main :: IO ()
main = do
let filename = "/tmp/somefile"
hPutStrLn stderr "Launching editor"
callProcess "nano" [filename]
t <- readFile filename
hPutStrLn stderr ("Editor returned, with contents: " ++ t)
demo?
Here's something using the callProcess wrapper - none of that big
scary CreateProcess data structure. It's going to launch the nano
text editor with a filename, and then when the editor returns,
the program will continue running and we'll get a printout of
whatever we saved into that file with the editor.
That's basically the model of git calling an editor for you to
type in a commit message.
module Main where
import System.Process
import System.IO (hPutStrLn, stderr)
main = do
let a = 5
let b = 3
let input = show a ++ " + " ++ show b ++ "\n"
output <- readProcess "bc" [] input
let r = read output
putStrLn $ "Result is " ++ show r
if a + b == r then putStrLn "correct" else error "wrong"
$ adder
Result is 8
correct
Here's a different example: calling out to "bc" to add numbers, and comparing
it with the result of using haskell's built in + operator. We can feed in
stdin here, and get stdout there and convert strings to/from real haskell
data types. nb. stringly typed stuff here, as per that controversial point
earlier.
Console colours
$ ls
ChangeLog.md README.md app package.yaml stack.yaml
LICENSE Setup.hs optparse.cabal src test
/home/benc/app/Main.hs:31:22: error:
Variable not in scope: isPrefixOf :: String -> [Char] -> Bool
|
31 | return $ filter (p `isPrefixOf` ) ["1", "8", "256"]
| ^^^^^^^^^^^^
We can make the output of a program appear in different colours. That happens
in real life for example in `ls` - different types of entry have different
colours - for example, directories are blue. Or in GHC output, where
syntax errors are highlighted in red.
Generally in a terminal these use inline escape codes, called ANSI
escape codes, that date back to
around 1978 - just a bit older than me
If you were a DOS user in the 1980s and loaded ANSI.SYS - that is the same codes.
Or if you were a BBS user in the 1990s and that BBS did colour, that was probably
also the same ANSI escape codes.
ANSI escape codes let you do things like colours - they also let you do
other things like move the cursor around, or setting the window title.
* console colours - this is the future after all. using the same control codes as used in 1980s/1990s era BBSes. and so your programs can look a bit like an 1990s BBS too!
- explain control codes basically (rather than someout of band signalling
like more modern graphics)
- more elaborate, but I'm not going to go into this, is how you'd do a "full screen" or rather full-window application.
- screenshot: diff from `git`
import System.Console.ANSI
setSGR [Reset, SetColor Foreground Dull Yellow]
putStrLn "hello"
setSGR [Reset]
^ screenshot of this
hSupportsANSI IO.stdout :: IO Bool
- ^ we want to be able to ask this because if we're feeding into a pipe,
it is conventional to not send colour codes: colours are for the terminal,
not for the next program in the pipe to consume.
Can also use ANSI codes to deal with Cursor positioning and request
things like console size - maybe want to truncate lines rather than
have them wrap, or configure your pretty printer based on that width.
- these are the same code sequences used in travis CI. and BBSes. and MS-DOS ANSI.SYS
and vt100 terminals
- maybe for fun include a BBS screen shot
- or a pic of a vt100
- TODO: describe how these things output specific magic byte sequences to the console - they are "in-band" in that
way and the library is just emitting those for you
module Main where
import System.Console.ANSI -- from ansi-terminal package
import System.IO (stdout)
import Control.Monad (when)
main = do
c <- hSupportsANSI stdout
when c $ setSGR [Reset, SetColor Foreground Dull Yellow]
putStr "hello "
when c $ setSGR [Reset, SetColor Background Dull Cyan,
SetColor Foreground Dull Black]
putStr "world"
when c $ setSGR [Reset]
putStrLn "."
$ ansi
hello world .
$ ansi
hello world .
$ ansi | less
hello world.
$ bad-ansi | less
ESC[0;33mhello ESC[0;46;30mworldESC[0m.
Command Line Options
$ ls
ChangeLog.md README.md app package.yaml stack.yaml
LICENSE Setup.hs optparse.cabal src test
$ ls --sort=size --format=long
total 40
drwxr-xr-x 2 benc benc 4096 May 29 07:04 app
drwxr-xr-x 2 benc benc 4096 May 29 07:04 src
drwxr-xr-x 2 benc benc 4096 May 29 07:04 test
-rw-r--r-- 1 benc benc 2134 May 29 07:04 stack.yaml
-rw-r--r-- 1 benc benc 1529 May 29 07:04 LICENSE
-rw-r--r-- 1 benc benc 1500 May 29 07:04 optparse.cabal
-rw-r--r-- 1 benc benc 1177 May 29 07:04 package.yaml
-rw-r--r-- 1 benc benc 48 May 29 07:04 ChangeLog.md
-rw-r--r-- 1 benc benc 46 May 29 07:04 Setup.hs
-rw-r--r-- 1 benc benc 11 May 29 07:04 README.md
One of the big things about command line tools is that you control
how they work by putting options on the command line - for example,
ls lists files in a directory, but I can configure more specifically
how it does that listing with options.
=====
- this is the big one
* commandline parsing
- the shit way - getArgs. gives an array of strings. ok if you want to
really pass in just one or two mandatory arguments.
... but that's not how many tools interfaces should work.
... give examples of "ls -t ~" or "git commit -a -m hello" with a
subcommand structure
- I'm a big fan of writing parsers in Haskell. optparse-applicative
is a parser for parsing command line options. so it has a different
feel. the raw tokens are individual strings, as comes from getArgs,
rather than characters. and there are some features which capture
common patterns in command line parsing that a more general parser
might not have.
import System.Environment
getArgs :: IO [String]
["--sort=size", "--format=long"]
The basic built in way of getting commandline options is getArgs. That
returns each argument as an element of a string list - approximately,
the commandline is split into tokens by spaces, and each token is an
element of that list.
That's convenient to use if you want something incredibly simple, like
passing in a single filename with no options.
But for anything more complicated, it is easy to fall into a rathole
of writing your own parser. Let someone else handle that complexity...
optparse-applicative: defintion
module Main where
import Options.Applicative
data Config = Config { verbose :: Bool }
configOpts :: Parser Config
configOpts = Config <$> switch (long "verbose" <> short 'v')
main = do config <- execParser (info configOpts mempty)
runWith config
runWith :: Config -> IO ()
runWith c = putStrLn $ if verbose c then "Verbose output"
else "ssssh"
So parsers were my original introduction to Haskell, and my first talk
here was about parsec. Here's another parser, optparse-applicative, that
is specialised for handling command-line parameters, rather than arbitrary
strings. It has a lot of functionality, a lot of which I'm going to ignore
here.
Here's the basic structure: we're going to parse our command line options
into a data structure which I've called Config. The only configuration at
the moment is a boolean called "verbose", which determines if our program
will be noisy or not.
Down here at the bottom, I can run a program with a config - if its verbose,
it prints one thing. if it's not verbose, it prints sssh.
And in the middle we're going to use optparse-applicative to generate a
Config object from the commandline:
Basically we can use applicative syntax (actually just functor syntax)
to say the first (and only) field of Config is a switch which can be
specified in two ways: either a long verbose name or a short v name.
I'll show what those invocations look like on the next slide.
If you've used applicative parsers before, this might be a familiar
way of constructing objects: you use applicative notation to put parsers
in for each value.
Then in main, there's some boilerplate that runs that parser and returns the
config.
optparse-applicative: use
$ prog
ssssh
$ prog -v
Verbose output
$ prog --verbose
Verbose output
$ prog --debug
Invalid option `--debug'
Usage: prog [-v|--verbose]
So here are some invocations of that program. If we run it with no
options, verbose defaults to false. We can use either short or long
form options, with a double-dash for long options; either of those
turn on verbose mode.
The final one is nice: rather than just failing to parse, there is
enough information in that definition for optparse-applicative to
generate some basic help text: that's something we get from using
applicatives not monads for parsing (I think?)
optparse-applicative: help
data Config = Config { verbose :: Bool, count :: Int }
configOpts :: Parser Config
configOpts = Config
<$> switch ( long "verbose" <> short 'v'
<> help "Enable verbose output")
<*> option auto ( long "count" <> help "How many?"
<> metavar "PIES")
main = do
config <- execParser (info (configOpts <**> helper)
(header "Opts example"))
runWith config
Providing decent help is one of the things you'll probably neglect
if you're casually writing your own commandline option handling. We've
already seen that optparse-applicative does more than you would probably
do. But there's more.
I've added in a second option, a count of how many pies you want, to
make it a bit more interesting. We use what someone's referred to as the
goatse operator to attach the parser for this second config value, an
integer - with only a long description, and two separate bits of help
text.
In main, I've added a second option parser, supplied by the library -
this one will provide a --help option, and a title for the program to
appear in the help text...
and on the next slide we can see what that help text looks like. It
looks like a real command line tool help!
optparse-applicative: help usage
$ prog --help
Opts example
Usage: prog [-v|--verbose] --count PIES
Available options:
-v,--verbose Enable verbose output
--count PIES How many?
-h,--help Show this help text
Here's the description. Here's our new count of pies - it isn't optional
because there's no default value. Here are the descriptions for each option
that we specified.
tab completion
$ ls
ChangeLog.md README.md app package.yaml stack.yaml
LICENSE Setup.hs optparse.cabal src test
$ cat opt <TAB>parse.cabal
$ git log -r <TAB><TAB>
HEAD master
$ git log -r mas <TAB>ster
This is my favourite bit. Maybe with something interactive.
Lots of unix shells have tab completion: originally for filenames, it
became customisable: for example, with git, tab completion will
complete branch names, not filenames, at appropriate places.
optparse-applicative can integrate with this too...
$ source <(prog --bash-completion-script `which prog`)
$ prog <TAB><TAB>-
--count --help --verbose -h -v
$ prog --ver <TAB>bose
We don't need to write any more code. Instead we can put this command in
our shell initialisation, which tells our program to interact with
bash to set up completion based on what it knows: which is that
we could complete the command line with any of these options... such
as verbose.
Custom tab completion
data Config = Config { verbose :: Bool, count :: Int }
configOpts :: Parser Config
configOpts = Config
<$> switch ( long "verbose" <> short 'v'
<> help "Enable verbose output")
<*> option auto ( long "count" <> help "How many?"
<> metavar "PIES"
<> completer myCompleter)
myCompleter :: Completer
myCompleter = mkCompleter $ \p -> do
hPutStr stderr "[IN COMPLETER]"
return $ filter (p `isPrefixOf`) ["1", "8", "256"]
My favourite bit of this is that we can write arbitrary Haskell
code to perform the completions - optparse-applicative can interface
arbitrary application-specific code to tab completion.
This code adds a custom completer to the count option: it's going
to suggests values of 1, 8 and 256. The completer is given the partially
type value so far, so that it can filter out non-matching completions
itself. And it can run arbitrary IO actions: I print a trace message to
stderr here (not stdout).
This is how, for example, you might poke in your application's
environment or even make a network request to find out suitable
values.
Maybe demo this?
Summary
Text streams: stdin, stdout, stderr
Exit codes: reporting success or failure
The environment
Running other processes
Console colours and ANSI fun
Command Line Options