Python Gradual Typing: THe Good, The Bad and The Ugly
Ben Clifford benc@hawaga.org.uk
BOB 2022
Draft 2022-02-21
intro
i'm going to talk about my experience adding type annotations to the parsl project codebase.
this has been going on for about four years, first as a side project to explore type checking
in parsl, and then as this has become more mainstream, pushing on the code a bit harder.
development context
I'll set the context for this work - here's a picture of code develoment, starting on a developers laptop,
and as we head to the right, the code goes through pull requests, the main branch, releases. This isn't meant
to be a definitive flow, more an approximation.
Also as we head to the right, bugs become more expensive: at the left, hacking on my laptop, i'm in a very tight loop
of maybe a few seconds for running code and editing that same code.
At the right, if here is a user using only versioned releases, and they are running produciton code that uses thousands of CPU
hours - then a bug i) wastes a lot of CPU time and ii) can't be fixed until the next release happens and the user upgrades to that.
So if I'm going to find a bug, I'd prefer to find it as far to left as possible - as soon as possible and affecting as few
people as possible.
(dynamic) types in Python
Values have types. Variables do not.
x = 3
type(x) # ==> <class 'int'>
x = {}
type(x) # ==> <class 'dict'>
* what does the type system look like in normal python? (show x = 3 type(x) x = {} type(x)) - x has no type, 3 has a type, {} has a type
* "values has types, variables do not" - that's a nice dividing line between dynamic python and statically typed python where we *do* start making assertions about "this x will be an integer, even though I don't know what the value will be"
Type syntax
# untyped
def square(y):
return y*y
x = square(1.41)
# typed
def square(y: float) -> float:
return y*y
x: float = square(1.41)
Type annotations have no (immediate) effect!
def square(y: float) -> float:
return y*y
x: float = square([])
Traceback (most recent call last):
File "<stdin>", line 1, in
File "<stdin>", line 2, in square
TypeError: can't multiply sequence by non-int of type 'list'
Runtime checking
@typeguard.typechecked
def square(y: float) -> float:
return y*y
x: float = square([])
Traceback (most recent call last):
...
TypeError: type of argument "y" must be either float or int;
got list instead
Static checking
def square(y: float) -> float:
return y*y
x: float = square([])
$ mypy source.py
source.py:4: error: Argument 1 to "square"
has incompatible type "List[<nothing>]";
expected "float"
Type Hierarchy
def f(x: object):
print(x)
x: float = 1.23
f(x) # typechecks ok, because float <= object
Gradual typing
def f(x: float):
print(x + 1)
y: Any = []
f(y) # typechecks, because
# List ~ Any ~ float (!)
but at runtime...
TypeError: can only concatenate list (not "int") to list
if we annotate everything with an Any type, pretty much everything will typecheck
in mypy (and then things might break, if they would break anyway)
can always infer Any as a type, even if we can't figure out anything tighter
maybe think of it a bit like use of "unsafe" as a label in haskell and a keyword in rust.
Antipattern vs Gradual Typing
a = planet
a = a.pickCountry()
a = a.pickCity()
a = a.pickCoordinates()
Rewrite this in more amenable style...
or
a: Any
Union types
def f(x: Union[float, str]):
if isinstance(x, float):
print(x*2)
else:
print("not a float")
y: float = 1.23
f(y) ==> 2.45
# float <= Union[float, str]
# str <= Union[float, str]
- This is not a Sum type: Union is "overlapping" - not a traditional sum type. Union[int,int] is the same as int
- `if` type refinement on a union type, gives x a tighter type.
- special handling of this `if isinstance` idiom...
- type refinement can also work on subclasses...
- intro to next section on hasattr and duck typing
Optional
Optional[X]
is equivalent to
Union[X, None]
Nothing super serious to say here - just point out that's how we get optionals.
Mention that "None" is the unit type in Python, with a single value also called None.
Because we don't have sum types, we have to have an actual value here...
and we can't express Optional[None] which you might expect expect with a Haskell "Maybe"
or Rust Option type.
X is an introduction to type variables
Duck typing, statically
"If it walks like a duck and it quacks like a duck, then it must be a duck"
def print_len(x):
print(len(x))
print_len([]) => 0 # empty list
print_len({}) => 0 # empty dict
print_len("hello") => 5 # str
print_len(1.23) => TypeError: object of type 'float' has no len()
* hasattr and isinstance - or perhaps here talk more about duck typing and Protocols?
- hasattr fits in "duck typing" model - we can at least do a runtime check for duck types. compare to protocols where we're checking that the attribute definitely does exist statically: so
- isinstance - an awfulness of nominal typing. doesn't work with duck/protocol typing...
^ these aren't things that definitely need addressing in this talk, but they're tangentially relevant
- Protocols - extend ABCs which let you declare @abstractmethods to be checked at runtime, and this adds on static checking
- class/type doesn't strictly define what members an object has - can be quite ad-hoc, so for example some instances of our executor class have "connected_workers" - some do not. Duck typing of "is this the kind of executor that uses countable workers?" - motivation for use of hasattr. Maybe that's ok in the duck typing world? It interacts awkwardly with python syntax. For example,
if hasattr('connected_workers'XXX):
print(x.connected_workers )
doesn't type-check, because the type of x is executor, which doesn't have a connected_workers attribute.
cf refinement of x's type if we perform an isinstance `if` statement.
- mention Protocols (and protocols are a specific instance of something else, i think, but i can't remember what)
- protocols give you supertypes that aren't in the class hierarchy, which is also what Union types are doing...
Duck typing, statically
class Sized(Protocol): # (based on real Python impl)
def __len__(self) -> int:
pass
def print_len(x: Sized):
print(len(x))
print_len([]) => 0
print_len({}) => 0
print_len("hello") => 5
isinstance({}, Sized) ==> True
print_len(100)
s.py:13: error: Argument 1 to "print_len" has incompatible type "int";
expected "Sized"
In haskell, this might be a usecase for typeclasses; in rust, a trait. In static, python
some similar - protocols. But not the same.
Duck typing, statically
class A():
def __len__(self):
return 128
a = A()
print_len(a) ==> 128
isinstance(a, Sized) ==> True
When I describe class A, there is no mention of sized at all - the "duck typing" bit of this is
that A is a Sized *because* it has the len method, not because I've said it is a subclass of
Sized. If this was a Haskell typeclass or a Rust trait, there would be some instance declaration
declaring this connection.
Dynamic arguments
def f(*args, **kwargs):
print(f"There are {len(args)} regular args")
print(f"There are {len(kwargs)} keyword args")
f() =>
There are 0 regular args
There are 0 keyword args
f(1,2,3) =>
There are 3 regular args
There are 0 keyword args
f(8, greeting="hello") =>
There are 1 regular args
There are 1 keyword args
* c.f. that old chestnut of "typed printf"
This feels like the sort of thing to avoid as much as possible if trying to do
static typing.
I'm not going to make any commentary about whether "typed printf" is a good thing or not.
Decorators
# typeguard
@typeguard.typechecked
def square(y: float) -> float:
return y*y
# parsl
@parsl.bash_app
def hostname():
return "/bin/hostname"
# flask (quickstart)
@app.route('/post/<int:post_id>')
def show_post(post_id):
return 'Post %d' % post_id
# appears as URL: http://localhost/post/53
* note on decorators:
- explain decorators in one slide as @dec def f(x): return x => f = lambda(x) return x => f = dec(lambda(x) return(x)).
+ what comes out doesn't even need to be a function!
+ dec can do *anything* (sideeffectful, and return value wise).
+ they're really nice. but really hard to type.
+ (eg: typeguard is a decorator we should have seen earlier. here's another one, which turns a function into an HTTP endpoint)
- this is an example of a python idiom that works nicely in python dynamic type land
but is hard (?impossible) to type:
- what comes out is either a decorator (when invoked with parameter syntax), or it is a decorator itself - so the type signature would need to be dual "decorator factory" and "decorator". We can tell which case is which as humans - when there is no supplied function, it should make a decorator. And the way to type that is probably dependent types, which is more advanced...
- typing of function signatures is really awkward (function signatures are *complex* in python...) especially if we want to do things like add in a new parameter (eg. parsl's stdout handling: it's the signature of the original function *plus* a new named parameter)
- link to mypy issue about supporting decorator with interesting adjective in the title.
- note on parameterised decorators where a decorator is also a decorator-factory depending on parameters
- PEP612 in python 3.10-dev ...
```
The first is the parameter specification variable. They are used to forward the parameter types of one callable to another callable – a pattern commonly found in higher order functions and decorators. Examples of usage can be found in typing.ParamSpec. Previously, there was no easy way to type annotate dependency of parameter types in such a precise manner.
The second option is the new Concatenate operator. It’s used in conjunction with parameter specification variables to type annotate a higher order callable which adds or removes parameters of another callable. Examples of usage can be found in typing.Concatenate.
```
I haven't tried these features out at all. https://docs.python.org/3.10/whatsnew/3.10.html
but the fact that they exist as new features can be a description of something that was missing in earlier python
and also a demonstration that this type system is still evolving even for doing "normal" things. and because gradual typing, we can get away with it for longer.
Decorators
@mydecorator
def f(x):
return x+1
desugars to (approx):
def internal_f(x):
return x+1
f = mydecorator(internal_f)
Decorator typing
@mydecorator
def f(x: int) -> int:
return x+1
# aka:
def internal_f(x: int) -> int:
return x+1
f = mydecorator(internal_f)
def mydecorator(function: ??) -> ??
...
Decorator typing
Sig = TypeVar('Sig')
def mydecorator(func: Sig) -> Sig
return func
def internal_f(x: int) -> int:
return x+1
f = mydecorator(internal_f)
Decorator typing
@parsl.python_app
def f(x: int) -> str
return str(x)
# should have type
# f(x: int) -> Future[str]
but
Sig = TypeVar('Sig')
def mydecorator(func: Sig) -> Sig
...
is not expresive enough (in Python <=3.9)
Co-/contra-variance
class Animal:
pass
class Dog(Animal):
pass
# Dog <= Animal <= object
animals: List[Animal] = []
def add_dog(l: List[Dog]):
my_dog: Dog = ...
l.append(my_dog)
add_dog(animals) # valid?
contavariance and covariance exist in other languages, but i've never run into problems with them
before. I think I've run into that here because I'm working with:
- more class hierarchy
- mutable structures
* subtyping: (more subclasses in our project than i'm used to) - should introduce subclasses in the "what are typing rules" earlier, but put the complicated stuff in this section)
- introduce subclasses more than I've encountered in Haskell... where it would be sub-typeclasses? maybe this is a thing due to lots more mutable structures, and loose reasoning about what can go in a List in our codebase?
- the awkwardness of List being invariant, not co/contravariant, with demo why
- but Sequence[] often to the rescue - and Sequence works for Tuple too, which is something
it turns out I wanted because something was surprise changing concrete type in parsl
(is Sequence something we can always iterate over, or what?)
- typing of super(): super() methods don't actually go to one of the superclasses automatically... notion of developing classes specifically to be inherited from (or multiply inherited from). is there a URL for that?
Co-/contra-variance
class Animal():
pass
class Dog(Animal):
pass
# Dog <= Animal <= object
animals: List[Animal] = [Cat(), Dog(), Dog(), Cow()]
def count_dogs(l: List[Dog]):
print(f"There are {len(l)} dogs")
count_dogs(animals) # valid?
Co-variance
Dog <= Animal
implies
Sequence[Dog] <= Sequence[Animal]
(Sequence[X] is a read only List/tuple/...)
If every dog is an animal, then every sequence of dogs is a sequence of animals.
Contra-variance
Dog <= Animal
imples
Callable[[Animal], str] <= Callable[[Dog], str]
Co-/contra-variance
* Co-variance eg. (read only) Sequence
or
* Contra-variance eg. function args
otherwise:
* invariant eg. List
In practice, hit problems with List
often. eg replace with Sequence
Parsl development considerations
* Easy stuff
- Can go into master
- type annotations with none of the nonsense that I've
just talked about; gradual typing/Any elsewhere
- easy for everyone to understand simple typing (c.f. Haskell98 crowd)
- high payoff in poorly-tested code like error handling
- typeguard at user boundaries (runtime checking)
- mypy within Parsl codebase (static checking)
* Hard stuff
- separate branch for my exploration
- discover bugs to fix on master
without necessarily adding types to master
- avoid forcing complication onto other dynamic Python programmers
- free to use latest python / mypy / type checker plugins
developer considerations:
* other developers are not experienced in types
* for simple type checking (eg. transcription from what they would already correctly write in a docstring) ... easy
* for complex typing: docstring stuff can kinda cough and get away with "white lies", but not so much
* complicated types are hard to read and understand, and it's not necessarily fair to force that on people who aren't either fully convinced, or fully in a position to spend time understanding what is going on in some of the more complicated situations I've previously outlined. - these needs to not be "my crazy pet project I'm forcing on others to the overall detriment of the project"
* describe the gradual typing process as part of describing and consolidating what the existing code does - a legitimate outcome of this work is also docstrings that capture not-obvious requirements that aren't necessarily typeable.
* i found typeguard an easier sell than mypy
* static typechecking especially in our project has found a lot of bugs in our error handling - because those are paths that are not checked very much as part of our regular integration test suite - we're mostly checking that things *work*
conclusion:
* a few bullet points
* net positive
* don't expect to end up with something like a haskell codebase
* other-developer considerations
* compile time checking vs logging exceptions - it has often been the case in parsl that we discard exceptions silently or to an obscure/inaccessible log file. Some of those exceptions could have been statically detected - eg PR #1081 - especially in a heavily distributed environment where we can't always pass stuff around. This maybe belongs in the dev process/justification section, as an example of why "well, we logged the exception" isn't good enough.