Python Gradual Typing: The Good, The Bad and The Ugly
Ben Clifford benc@hawaga.org.uk
BOB 2022
Introduction
Parsl project - parallel scripting on super-computers
Prototype starting 2016 to v1.2 now
Mostly Python ...
... but I'm a Haskell enthusiast
Explore typing in Python / improve quality of code
i'm going to talk about my experience adding type annotations to the parsl project codebase.
this has been going on for about four years, first as a side project to explore type checking
in parsl, and then as this has become more mainstream, pushing on the code a bit harder.
(dynamic) types in Python
Values have types. Variables do not.
x = 3
type(x) # ==> <class 'int'>
x = {}
type(x) # ==> <class 'dict'>
* what does the type system look like in normal python? (show x = 3 type(x) x = {} type(x)) - x has no type, 3 has a type, {} has a type
* "values has types, variables do not" - that's a nice dividing line between dynamic python and statically typed python where we *do* start making assertions about "this x will be an integer, even though I don't know what the value will be"
Type syntax
# untyped
def square(y):
return y*y
x = square(1.41)
# typed
def square(y: float) -> float:
return y*y
x: float = square(1.41)
Type annotations have no (immediate) effect!
def square(y: float) -> float:
return y*y
x: float = square([])
Traceback (most recent call last):
File "<stdin>", line 1, in
File "<stdin>", line 2, in square
TypeError: can't multiply sequence by non-int of type 'list'
Runtime checking
@typeguard.typechecked
def square(y: float) -> float:
return y*y
x: float = square([])
Traceback (most recent call last):
...
TypeError: type of argument "y" must be either float or int;
got list instead
Static checking
def square(y: float) -> float:
return y*y
x: float = square([])
$ mypy source.py
source.py:4: error: Argument 1 to "square"
has incompatible type "List[<nothing>]";
expected "float"
Type Hierarchy
def f(x: object):
print(x)
x: float = 1.23
f(x) # typechecks ok, because float <= object
Gradual typing
def f(x: float):
print(x + 1)
y: Any = []
f(y) # typechecks, because
# List ~ Any ~ float (!)
but at runtime...
TypeError: can only concatenate list (not "int") to list
Not the same as object
if we annotate everything with an Any type, pretty much everything will typecheck
in mypy (and then things might break, if they would break anyway)
can always infer Any as a type, even if we can't figure out anything tighter
maybe think of it a bit like use of "unsafe" as a label in haskell and a keyword in rust.
Antipattern vs Gradual Typing
a = planet
a = a.pickCountry()
a = a.pickCity()
a = a.pickCoordinates()
Rewrite this in more amenable style...
or
a: Any
Union types
def f(x: Union[float, str]):
if isinstance(x, float):
print(x*2)
else:
print("not a float")
y: float = 1.23
f(y) ==> 2.45
# float <= Union[float, str]
# str <= Union[float, str]
- This is not a Sum type: Union is "overlapping" - not a traditional sum type. Union[int,int] is the same as int
- `if` type refinement on a union type, gives x a tighter type.
- special handling of this `if isinstance` idiom...
- type refinement can also work on subclasses...
- intro to next section on hasattr and duck typing
Optional
Optional[X]
is equivalent to
Union[X, None]
Nothing super serious to say here - just point out that's how we get optionals.
Mention that "None" is the unit type in Python, with a single value also called None.
Because we don't have sum types, we have to have an actual value here...
and we can't express Optional[None] which you might expect expect with a Haskell "Maybe"
or Rust Option type.
X is an introduction to type variables
Generics
x: List # aka List[Any]
x: List[str]
Summary
Type annotations + pluggable enforcement
Runtime: typeguard, Static: mypy
Generics. Unions.
Gradual typing: Any - distinct from object
Duck typing, statically
"If it walks like a duck and it quacks like a duck, then it must be a duck"
def print_len(x):
print(len(x))
print_len([]) # => 0 empty List
print_len({}) # => 0 empty Dict
print_len("hello") # => 5 str
print_len(1.23)
# => TypeError: object of type 'float' has no len()
* hasattr and isinstance - or perhaps here talk more about duck typing and Protocols?
- hasattr fits in "duck typing" model - we can at least do a runtime check for duck types. compare to protocols where we're checking that the attribute definitely does exist statically: so
- isinstance - an awfulness of nominal typing. doesn't work with duck/protocol typing...
^ these aren't things that definitely need addressing in this talk, but they're tangentially relevant
- Protocols - extend ABCs which let you declare @abstractmethods to be checked at runtime, and this adds on static checking
- class/type doesn't strictly define what members an object has - can be quite ad-hoc, so for example some instances of our executor class have "connected_workers" - some do not. Duck typing of "is this the kind of executor that uses countable workers?" - motivation for use of hasattr. Maybe that's ok in the duck typing world? It interacts awkwardly with python syntax. For example,
if hasattr('connected_workers'XXX):
print(x.connected_workers )
doesn't type-check, because the type of x is executor, which doesn't have a connected_workers attribute.
cf refinement of x's type if we perform an isinstance `if` statement.
- mention Protocols (and protocols are a specific instance of something else, i think, but i can't remember what)
- protocols give you supertypes that aren't in the class hierarchy, which is also what Union types are doing...
Duck typing, statically
class Sized(Protocol): # (based on real Python impl)
def __len__(self) -> int:
pass
def print_len(x: Sized):
print(len(x))
print_len([]) # => 0
print_len({}) # => 0
print_len("hello") # => 5
isinstance({}, Sized) # => True
print_len(100)
s.py:13: error: Argument 1 to "print_len" has incompatible type "int";
expected "Sized"
In haskell, this might be a usecase for typeclasses; in rust, a trait. In static, python
some similar - protocols. But not the same.
Duck typing, statically
class A():
def __len__(self):
return 128
a = A()
print_len(a) # => 128
isinstance(a, Sized) # => True
When I describe class A, there is no mention of sized at all - the "duck typing" bit of this is
that A is a Sized *because* it has the len method, not because I've said it is a subclass of
Sized. If this was a Haskell typeclass or a Rust trait, there would be some instance declaration
declaring this connection.
Dynamic arguments
def f(*args, **kwargs):
print(f"There are {len(args)} regular args")
print(f"There are {len(kwargs)} keyword args")
f()
# There are 0 regular args
# There are 0 keyword args
f(1,"two",3)
# There are 3 regular args
# There are 0 keyword args
f(8, greeting="hello")
# There are 1 regular args
# There are 1 keyword args
* c.f. that old chestnut of "typed printf"
This feels like the sort of thing to avoid as much as possible if trying to do
static typing.
I'm not going to make any commentary about whether "typed printf" is a good thing or not.
Decorators
# typeguard
@typeguard.typechecked
def square(y: float) -> float:
return y*y
# parsl
@parsl.bash_app
def hostname():
return "/bin/hostname"
# flask
@app.route('/post/<int:post_id>')
def show_post(post_id):
return 'Post %d' % post_id
# appears as URL: http://localhost/post/53
* note on decorators:
- explain decorators in one slide as @dec def f(x): return x => f = lambda(x) return x => f = dec(lambda(x) return(x)).
+ what comes out doesn't even need to be a function!
+ dec can do *anything* (sideeffectful, and return value wise).
+ they're really nice. but really hard to type.
+ (eg: typeguard is a decorator we should have seen earlier. here's another one, which turns a function into an HTTP endpoint)
- this is an example of a python idiom that works nicely in python dynamic type land
but is hard (?impossible) to type:
- what comes out is either a decorator (when invoked with parameter syntax), or it is a decorator itself - so the type signature would need to be dual "decorator factory" and "decorator". We can tell which case is which as humans - when there is no supplied function, it should make a decorator. And the way to type that is probably dependent types, which is more advanced...
- typing of function signatures is really awkward (function signatures are *complex* in python...) especially if we want to do things like add in a new parameter (eg. parsl's stdout handling: it's the signature of the original function *plus* a new named parameter)
- link to mypy issue about supporting decorator with interesting adjective in the title.
- note on parameterised decorators where a decorator is also a decorator-factory depending on parameters
- PEP612 in python 3.10-dev ...
```
The first is the parameter specification variable. They are used to forward the parameter types of one callable to another callable – a pattern commonly found in higher order functions and decorators. Examples of usage can be found in typing.ParamSpec. Previously, there was no easy way to type annotate dependency of parameter types in such a precise manner.
The second option is the new Concatenate operator. It’s used in conjunction with parameter specification variables to type annotate a higher order callable which adds or removes parameters of another callable. Examples of usage can be found in typing.Concatenate.
```
I haven't tried these features out at all. https://docs.python.org/3.10/whatsnew/3.10.html
but the fact that they exist as new features can be a description of something that was missing in earlier python
and also a demonstration that this type system is still evolving even for doing "normal" things. and because gradual typing, we can get away with it for longer.
Decorators
@mydecorator
def f(x):
return x+1
desugars to (approx):
def internal_f(x):
return x+1
f = mydecorator(internal_f)
Decorator typing
@mydecorator
def f(x: int) -> int:
return x+1
# aka:
def internal_f(x: int) -> int:
return x+1
f = mydecorator(internal_f)
def mydecorator(function: ??) -> ??
...
Decorator typing
Sig = TypeVar('Sig')
def mydecorator(func: Sig) -> Sig
return func
def internal_f(x: int) -> int:
return x+1
f = mydecorator(internal_f)
Decorator typing
@parsl.python_app
def f(x: int) -> str
return str(x)
# should have type
# f(x: int) -> Future[str]
but
Sig = TypeVar('Sig')
def mydecorator(func: Sig) -> Sig
...
is not expresive enough (in Python <=3.9)
Co-/contra-variance
class Animal:
pass
class Dog(Animal): # Dog <= Animal <= object
pass
animals: List[Animal] = []
def add_dog(l: List[Dog]):
my_dog: Dog = ...
l.append(my_dog)
add_dog(animals) # valid?
contavariance and covariance exist in other languages, but i've never run into problems with them
before. I think I've run into that here because I'm working with:
- more class hierarchy
- mutable structures
* subtyping: (more subclasses in our project than i'm used to) - should introduce subclasses in the "what are typing rules" earlier, but put the complicated stuff in this section)
- introduce subclasses more than I've encountered in Haskell... where it would be sub-typeclasses? maybe this is a thing due to lots more mutable structures, and loose reasoning about what can go in a List in our codebase?
- the awkwardness of List being invariant, not co/contravariant, with demo why
- but Sequence[] often to the rescue - and Sequence works for Tuple too, which is something
it turns out I wanted because something was surprise changing concrete type in parsl
(is Sequence something we can always iterate over, or what?)
- typing of super(): super() methods don't actually go to one of the superclasses automatically... notion of developing classes specifically to be inherited from (or multiply inherited from). is there a URL for that?
Co-/contra-variance
class Animal():
pass
class Dog(Animal): # Dog <= Animal <= object
pass
animals: List[Animal] = [Cat(), Dog(), Dog(), Cow()]
def count_dogs(l: List[Dog]):
print(f"There are {len(l)} dogs")
count_dogs(animals) # valid?
Co-variance
Dog <= Animal
implies
Sequence[Dog] <= Sequence[Animal]
(Sequence[X] is a read only List/tuple/...)
If every dog is an animal, then every sequence of dogs is a sequence of animals.
Contra-variance
Dog <= Animal
imples
Callable[[Animal], str] <= Callable[[Dog], str]
Co-/contra-variance
Co-variance eg. (read only) Sequence
Contra-variance eg. function args
Otherwise: invariant eg. List
In practice, hit problems with List, eg replace with Sequence
Parsl development considerations
Parsl development considerations
This started as an exploration for myself ...
... but now we want it in the Parsl production codebase
In master: The Easy stuff
Over time (years), introduce more type annotations
Only simple typing - eg first part of this talk
Must be understandable by other Python developers
If anything gets complicated - Any
typeguard at API boundary to users, runtime
mypy within the codebase, CI time
Payoffs: The Easy Stuff
Static typing coverage of non-integration-tested code:
Exception/error handling
Plugins untested in our CI
Downsides
Nowhere near:
"it typechecks so it must be correct"
Still extremely dynamic
The hard stuff
eg. 2nd part of this talk
Separate branch for my exploration
Port discovered bugfixes to master, but not necessarily type annotations
Free to use confusing types
Free to use latest Python without compatibility concerns
No worries about confusing other people
developer considerations:
* other developers are not experienced in types
* for simple type checking (eg. transcription from what they would already correctly write in a docstring) ... easy
* for complex typing: docstring stuff can kinda cough and get away with "white lies", but not so much
* complicated types are hard to read and understand, and it's not necessarily fair to force that on people who aren't either fully convinced, or fully in a position to spend time understanding what is going on in some of the more complicated situations I've previously outlined. - these needs to not be "my crazy pet project I'm forcing on others to the overall detriment of the project"
* describe the gradual typing process as part of describing and consolidating what the existing code does - a legitimate outcome of this work is also docstrings that capture not-obvious requirements that aren't necessarily typeable.
* i found typeguard an easier sell than mypy
* static typechecking especially in our project has found a lot of bugs in our error handling - because those are paths that are not checked very much as part of our regular integration test suite - we're mostly checking that things *work*
conclusion:
* a few bullet points
* net positive
* don't expect to end up with something like a haskell codebase
* other-developer considerations
* compile time checking vs logging exceptions - it has often been the case in parsl that we discard exceptions silently or to an obscure/inaccessible log file. Some of those exceptions could have been statically detected - eg PR #1081 - especially in a heavily distributed environment where we can't always pass stuff around. This maybe belongs in the dev process/justification section, as an example of why "well, we logged the exception" isn't good enough.
Conclusion
Easy stuff is easy
Porting existing codebase can get hard fast when the style is wrong
Worth it without complete coverage
Fresh code easier to write in a checkable style
You should use at least the stuff in all your code