Ownership in the type system
Ben Clifford
London Haskell, June 2018
benc@hawaga.org.uk
Press 's' for speaker notes
0. Rust can provide memory safety by statically tracking who owns
each piece of allocated memory. This needs more elaborate type
signatures, and plenty of wrestling to make it all compile -
something any Haskeller will be familiar with. I'll talk a bit
about the theory, a bit about the concrete language features
that make this more usable, and a bit about some vaguely related
Haskell.
What is Rust?
closures
traits / typeclasses
parametric polymorphism
concurrency
pattern matching
single-assignment variables
nice macros
1. I'm going to talk about resource management in a language called
Rust, which I expect a lot of people have heard about, even if
they haven't used it.
code examples: 3 languages:
Haskell, C, Rust.
Briefly, Rust is a "safe, concurrent, practical language".
care about safety but also care about c-like performance.
parts of firefox in rust.
List of features on slide.
void f() {
r = malloc(10);
mutate(r);
free(r);
}
do
r <- hOpenFile "foo" WriteMode
hPutStrLn r "hello world"
hClose r
I'll review some resource management techniques -
resource = memory my main example; open files.
"management" =
keeping track of how that resource is allocated, used and released.
Explicit, C-like, style.
We allocate some memory (for example, with malloc). We use it.
We free it. r = a "reference" or a "pointer" without getting too technical.
File example.
In these two examples, all in a row in the source code,
so easy to reason about.
In reality it is likely a reference gets passed around
stored and used in several different places,
on its eventual journey to being released.
correctness proof "in our head (or test suites etc)",
not in the compiler.
use-after-free
void f() {
r = malloc(10);
free(r);
mutate(r);
}
do
r <- hOpenFile "foo" WriteMode
hClose r
hPutStrLn r "hello world"
What might go wrong?
We might free the resource before we have finished using it. Then what
happens? Memory corruption or a segfault or other failures.
[use-after-free]
resource leak
void f() {
r = malloc(10);
mutate(r);
}
do
r <- hOpenFile "foo" WriteMode
hPutStrLn r "hello world"
We might forget to free the memory, and then we have a memory leak:
that memory remains allocated "forever" (for some notion of forever)
let r = FooConstructor
in process r
Another resource management: garbage collection.
Pretty much the standard approach for memory
management in high level
languages - eg Haskell does things - but not other
stuff like file handle management.
We allocate some memory. We use it. We just stop talking about it
and it goes away. Eventually.
If 'r' is in scope, memory is valid.
But garbage collectors run at runtime - this adds on
a bunch of overhead. Namecheck pusher/will.
Another disadvantage is that sometimes we do want to be a
bit more explicit about when
a resource is released - eg file close. but hard with GC.
withFile "myfile.txt" WriteMode $
\r -> hPutStrLn r "hello world"
void f()
{
int x;
mutate(&x)
}
A third way is scope based, local variables: built into C, and
provided by bracketing style of helper function for files in
Haskell.
The resource is released when the scope ends: in the C example,
the space usually happens to be on the stack and disappears when
we return from the function; in the Haskell example, we have this
helper which opens a file, passes the handle, and then closes
the file at the end - so the handle is valid within this
scope defined by the lambda expression.
do
r' <- withFile "myfile.txt" WriteMode $
\r -> return r
int * f()
{
int x;
return &x;
}
We don't write a free call, and x will always be released, but we
need to be careful to not pass a reference to x outside of the
scope (for example, if mutate stores it in a global struct for
later use) - otherwise we get "use-after-free" bugs like in the
earlier examples.
int f(int *r)
{
return 7;
}
f :: (Handle, Handle) -> IO ()
f (_,b) = hClose b
Things we do every day with references that can cause trouble.
with reasoning. can do to any value in most
languages, but the problems specifically arise when those values
are references, that is names to some other resource (a pointer to
member, a file handle).
First, we're allowed to forget references. We might just stop
talking about a variable; or we might only pattern match
part of a tuple.
When we do this, we hope that someone else (either another part
of our code, or a garbage collector) will release the resources.
void f() {
r = malloc(10);
mutate(r);
free(r);
}
do
r <- hOpenFile "foo" WriteMode
hPutStrLn r "hello world"
hClose r
That's usually combined with being able to duplicate references. We've
passed a reference into mutate or putStrLn
and those have done some stuff and then forgetten the reference,
but that's fine, because we've also kept a copy for ourselves to
use to free/close later.
Those two everyday operations - forgetting and duplicating - are part
of what makes it quite hard to reason about whats going on.
Controlling these inherently dangerous operations
is a big part of Rust's resource management story.
fn main()
{
let r = vec![1, 2, 3];
process(&r);
}
Every value in rust is owned by exactly one variable. When that
variable goes out of scope, that value is freed.
To begin with, that looks like the scope based model I talked about
previously.
Rust puts in a static implicit free when r goes out of scope.
To be clear - it isn't garbage collected.
fn main()
{
let r = vec![1, 2, 3];
process(&r);
drop(r);
}
It's basically doing this, which is more explicit and also valid rust.
fn main()
{
let r = vec![1, 2, 3];
process(&r);
drop(r);
process(&r);
}
5 | drop(r);
| - value moved here
6 | process(&r);
| ^ value used here after move
OK, so here's some buggy code. use-after-free bug.
Rust rejects this!
What has happened here is that I've transferred ownership of that
vector - instead of being owned by r in this current
scope, it has been handed over to be owned by drop, and so we can't
also have it owned by r any more. So we have this novelty that
variables can disappear effectively
out of scope by using them in certain ways.
Then it turns out that the implementation of drop just releases
all the resources and can safely forget about te value.
fn main()
{
let r = vec![1, 2, 3];
process(&r);
process(&r);
}
fn process(s : &Vec<i32>) {
println!("vector size: {}", (*s).len());
}
$ ./a
vector size: 3
vector size: 3
So why does that happen for drop and not for process? Because we're
passing in a reference to that value, rather than passing the
value itself.
References have slightly different rules applied to them, but there
is still novel checking going on.
Here's more code, with two function calls - it'll output the
vector size twice.
fn main()
{
let r = vec![1, 2, 3];
process(&r);
process(&r);
}
fn process(s : &Vec<i32>) {
println!("vector size: {}", s.len());
drop(*s)
}
error[E0507]: cannot move out of borrowed content
--> a.rs:10:8
|
10 | drop(*s)
| ^^ cannot move out of borrowed content
What happens if I try to release the vector inside the process
function - trying to introduce a different use-after-free
bug.
As we might hope, we get a compiler error - "cannot move out of
borrowed context". What does that mean? When we call drop, we
transfer ownership of the value to drop. But, inside "process"
we don't actually own s, so it isn't ours to give away.
We've just "borrowed" it from the caller, and part of that calling
contract is that at the end, we have to give it back.
We can't release it;
we can't transfer the ownership to someone else.
We can however allow another function call to borrow it from us, deeper
and deeper: we know statically
that next level of function will give it back,
because it's only borrowing, and so we know that at the end of
process, we'll be able to give it back.
fn main()
{
let r = vec![1, 2, 3];
process(r);
process(r);
}
fn process(s : Vec<i32>) {
println!("vector size: {}", s.len());
} // release happens here
error[E0382]: use of moved value: `r`
--> a.rs:5:11
|
4 | process(r);
| - value moved here
5 | process(r);
| ^ value used here after move
|
= note: move occurs because `r` has type `std::vec::Vec<i32>`,
which does not implement the `Copy` trait
Here's a different implementaiton moving ownership rather than
borrowing - allocate in main, release in process. now main doesn't
own it for second call
Fighting the borrow checker!
See error talks about copy trait - some types can be copied safely
like u32, so can be more relaxed.
fn main()
{
let r = 10;
process(r);
process(r);
}
fn process(s : i32) {
println!("integer is: {}", s);
}
$ ./b
integer is: 10
integer is: 10
so here's the same code, but instead of using a vector, it uses
a single integer. This compiles and run fine because i32 has
the Copy trait. So all this borrow checking and ownership
stuff doesn't apply to "simple" values, thankfully. Just to more
complicated values.
fn main()
{
let r = vec![1, 2, 3];
let q = vec![1, 2, 3, 4];
let l = longest(&q, &r);
process(l);
}
fn longest(s : &Vec<i32>, t : &Vec<i32>) -> &Vec<i32> {
if (*s).len() > (*t).len() {
return s;
} else {
return t;
}
}
error[E0106]: missing lifetime specifier
--> c.rs:9:45
|
9 | fn longest(s : &Vec<i32>, t : &Vec<i32>) -> &Vec<i32> {
| expected lifetime parameter ^
|
= help: this function's return type contains
a borrowed value, but the signature does not say
whether it is borrowed from `s` or `t`
Reference lifetimes a bit. here's some code
that given two vectors, returns the longest of the two.
Doesn't compile. Input references - we know lifetimes - borrowed.
but output - reference to what? what does the output have a lifetime of?
how long can we use it?
In this code, the l reference is going to be valid as long
as r and q are both alive; but if either of them has been dropped,
then potentially there's a problem and we can't be allowed to
use that reference any more.
fn main()
{
let r = vec![1, 2, 3];
let q = vec![1, 2, 3, 4];
let l = longest(&q, &r);
process(&l);
}
fn longest <'a> (s : &'a Vec<i32>, t : &'a Vec<i32>)
-> &'a Vec<i32> {
if (*s).len() > (*t).len() {
return s;
} else {
return t;
}
}
we can get round this by using lifetime annotations, like this.
compile time variables that refer to reference lifetimes.
a is a lifetime that is at least as long as both s,t. so
return reference also has a lifetime a.
In this example, in main, we call longest with an implicit compile
time parameter that refers basically to the scope of main: r and q
are valid in that scope, and so the return value l is valid in that
scope too.
If we called longest somewhere else, there might be a different
implicit scope for that invocation.
use std::rc::Rc;
fn main()
{
let r = Rc::new(vec![1, 2, 3]);
let q = r.clone();
process(&r);
process(&q);
drop(r);
process(&q);
drop(q);
}
fn process(s : &Vec<i32>) {
println!("vector size: {}", (*s).len());
}
Finally, smart pointers. These are pointers
or references that know when they are dropped, - they can do something
when you forget them. cf. a finaliser in garbage collection,
but safer and more useful.
Basic example: reference counted smart pointer.
Value can have multiple owners - stays allocated as long
as any owner exists - runtime.
We can make a vector value and make a reference counted smart pointer
to it - then we can clone it - and use r and q as if we owned the
value. We can drop one, but the other stays alive. The final drop
is what will actually drop the vector.
use std::sync::Mutex;
fn main()
{
let r = Mutex::new(vec![1, 2, 3]);
{
let r1 = &r.lock().unwrap();
process(r1);
}
{ let r2 = &r.lock().unwrap();
process(r2);
}
}
A more interesting example, I think, is a mutex,
which gives a taste of some of rust's concurrency libraries.
We can lock the value protected by the mutex and simultaneously
acquire a smart pointer to that value - the lock remains until
that smart pointer goes out of scope.
What does Haskell offer in this space?
linear types proposal, including implementation that
I've run on my laptop to.
differet goals to rust, and has a
very different feel,
but it has a
lot of similarities in so much as it cares
about how and how many times you reference a value,
so that more interesting behaviour can be built on that knowledge.
There's a paper you can read. Some of the things they claim to do
are on this slide.
accessing serialised data without copying it
sockets with type level state
pure bindings to impure APIs