Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reads clauses on methods #6

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
239 changes: 239 additions & 0 deletions text/0000-reads-clauses-on-methods.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
- Feature Name: reads_clauses_on_methods
- Start Date: 2021-09-17
- RFC PR: [dafny-lang/rfcs#0000](https://github.com/dafny-lang/rfcs/pull/0000)
- Dafny Issue: [dafny-lang/dafny#0000](https://github.com/dafny-lang/dafny/issues/0000)

# Summary
[summary]: #summary

This proposal simply adds support for optional `reads` clauses on methods, just as they are already supported on functions. While the default for functions is disallowing all reads (i.e. `reads {}`) the default for methods would be allowing all reads (i.e. `reads *`), allowing full backwards-compatibility.

# Motivation
[motivation]: #motivation

Dafny currently has no direct support for concurrency. The soundness of method verification depends on the assumption that no modification to mutable state occurs other than what is present in the method body. Consider this example of a Dafny class with internal state:

```dafny
class Counter {
var x: int
constructor() {
x := 0;
}
function method Get()
reads this
ensures Get() == x
{
x
}
method Increment()
modifies this
ensures x == old(x) + 1
{
x := x + 1;
}
}
```

If this class is compiled to C#, for example, and its instances are shared between multiple threads, its verified properties are no longer true. If two threads both invoke `Increment` on a single counter, the effect could easily be to only increment `x` once, since reading and writing `x` does not occur atomically. Even the post-condition on `Get` is no longer valid, because the value of `x` could change in-between reading its value and returning from the compiled body.

This proposal does not aim to add concurrency primitives to Dafny, or even propose any particular choice of paradigm for asynchronous or concurrent execution. Instead, it supports verifying that a particular method is safe to use in a concurrent environment, by verifying that it does not operate on any shared mutable state.
Copy link
Member

@keyboardDrummer keyboardDrummer Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by verifying that it does not operate on any shared mutable state.

Can't you also achieve that by controlling which arguments are passed to the method? If the (static) method doesn't have any pointers to traverse from, then it can't read from or write to any mutable state, can it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work too! But there are a few downsides to that approach:

  1. The analysis to traverse all paths from a value to find any potential mutable state doesn't exist as any part of Dafny's implementation already, and would have to deal with some tricky cases where it would have to be overly conservative (e.g. if you have a generic method that takes a T value, you'd probably have to be safe and reject it just in case it ends up making mutable state reachable). The framing approach is quite a bit cheaper to implement since modifies already exists and reads can heavily reuse its verification machinery.
  2. That analysis would not be compatible with modular verification, since you'd need to look inside dependent modules in case there is any mutable state lurking there. Whereas reads and modifies frames will be/are part of the specification of module contents.
  3. Rejecting any input connected to mutable state is more restrictive than necessary, since there's no danger if you don't actually read or write that state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks !


# Guide-level explanation
Copy link
Member

@keyboardDrummer keyboardDrummer Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section doesn't describe any use-cases that require something other than reads {}, so why not introduce a more restricted construct like threadsafe ? That gives less freedom to the programmer, so less chance of confusion, and you could relax the associated checks in the future, for example to allow reading passed in arguments as long as they are immutable or owned by the callee.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other reads frames are necessary on internal methods (i.e. not exposed to external code) in the control flow though. For example, FibonacciMemoizer.Get has to declare reads this, because otherwise the default spec of reads * would violate the tighter reads {} frame of SafeFirstNFibonaccis.

I would actually argue that reusing an existing concept in a very similar additional context is a lower cognitive burden on programmers than a new concept like threadsafe (and again I'm trying not to introduce the concept of a "thread" unless we're 100% sure that's the right concurrency model for Dafny).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually argue that reusing an existing concept in a very similar additional context is a lower cognitive burden on programmers than a new concept

I guess that's a difficult thing to determine, but I was thinking that if there is a specific pattern (reads {}) used for a specific use-case (calling externally exposed methods concurrently), it does help to name that use-case and remove the need for the pattern.

Other reads frames are necessary on internal methods (i.e. not exposed to external code) in the control flow though. For example, FibonacciMemoizer.Get has to declare reads this, because otherwise the default spec of reads * would violate the tighter reads {} frame of SafeFirstNFibonaccis.

I see. I'm a little confused by the file text/guide-level-explanation-example-1.dfy which has similar but different content that's what's inline in text/0000-reads-clauses-on-methods.md. The former has reads this but the latter one doesn't.

Given that FibonacciMemoizer.Get doesn't take any reference type parameters, isn't reads this the upper bound of what it can read so reads this can be implicit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's a difficult thing to determine, but I was thinking that if there is a specific pattern (reads {}) used for a specific use-case (calling externally exposed methods concurrently), it does help to name that use-case and remove the need for the pattern.

That's true. I could imagine something like {:threadsafeExtern}, or some way of parameterizing {:extern}. It wouldn't eliminate the need for reads clauses in general in order to verify this property, though. You'd have to either have that annotation imply reads {} or require that such methods also declare reads {} explicitly, and I'd likely prefer the latter.

Given {:extern} is currently very weakly specified and already has several other sharp edges (https://github.com/dafny-lang/dafny/labels/foreign-function-interface), it's my preference to leave this out of scope for now, just as we also want to leave any concurrency semantics for Dafny out of scope if we can.

I see. I'm a little confused by the file text/guide-level-explanation-example-1.dfy which has similar but different content that's what's inline in text/0000-reads-clauses-on-methods.md. The former has reads this but the latter one doesn't.

Ah that's my fault for letting the two versions get out of sync. I was hoping to find a way to include the source files into the markdown. I've deleted them and put the missing reads this clause into the example.

Some classes will only have const fields, so some class methods can declare reads {} rather than reads this. I'm still thinking making the default reads * everywhere would be better for simplicity.

Copy link
Member

@keyboardDrummer keyboardDrummer Sep 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I could imagine something like {:threadsafeExtern}, or some way of parameterizing {:extern}. It wouldn't eliminate the need for reads clauses in general in order to verify this property, though. You'd have to either have that annotation imply reads {}

Yes, that was what I was thinking of

or require that such methods also declare reads {} explicitly, and I'd likely prefer the latter.

I think I agree that reads clauses on methods are the more lightweight feature to add since it doesn't introduce any new concept.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're okay with a keyword like threadsafe, then the language could say the following:

  • A method that is not declared to be threadsafe is not allowed to have a reads clause. Such a method is allowed to read anything (just like today).
  • A method declared with threadsafe method is allowed to have a reads clause. We now even have the freedom to say that the default reads clause for a threadsafe method is the empty set. This would make it consistent with both reads and modifies clauses today. Furthermore, this would completely address my concern about beginning users getting the wrong idea that they must declare the reads set for method -- if they ever see, or try to declare, a method with a reads clause, then that method is declared as threadsafe method, which the user can identify.

[guide-level-explanation]: #guide-level-explanation

In most Dafny code, you will not have to specify a reads clause on any methods, as they are permitted to read any mutable state by default. In some cases the specification for a method needs to be more precise, however. In particular, invoking a compiled method from external, native code in a concurrent environment is much more likely to be safe if the method does not read or write any shared mutable state. It will now be possible to specify this requirement by using `reads` clauses on methods as well as functions.

Let's consider a case where we want to provide some Dafny-implemented functionality to external code. Let's use our favorite example, the Fibonacci sequence. Assume that this needs to work in at least one language with threads and shared memory, such as C# or Java. Knowing that is a requirement, one way to meet it is to ensure that the Dafny code does not read or write any shared mutable state:

```dafny
// Here {:extern} is used to allow external code to invoke this method,
// rather than indicate it will be externally implemented.
method {:extern} Fibonacci(n: nat) returns (r: nat)
reads {}
// modifies {} is the default so it can be omitted
{
...
}
```

Suppose you want to add some caching as we implement this, since the results of one call can be reused by others. We need to have clients pass in the cache since Dafny offers no way to store it as global state (which is frankly a good thing). Our first attempt leads to a verification error, however, which saves us from accidentally introducing a race condition!

```dafny

class FibonacciMemoizer {
var cache: map<nat, nat>
constructor() reads {} {
cache := map[];
}
method Get(n: nat) returns (r: nat)
reads this
modifies this
{
if n < 2 {
return 1;
}
if n in cache {
r := cache[n];
} else {
var oneBefore := Get(n - 1);
var twoBefore := Get(n - 2);
r := oneBefore + twoBefore;
// This line introduces a race condition in multi-threaded target languages.
// Updating two different entries concurrently can lead to one of them being dropped!
cache := cache[n := r];
}
}
}

method {:extern} Fibonacci(n: nat, memoizer: FibonacciMemoizer) returns (r: nat)
reads {}
{
r := memoizer.Get(n); // Error: insufficient reads clause to invoke method
// Error: call may violate context's modifies clause
}
```

Note that this technique is most effective when the person providing the specification is different from the person implementing it. It ensures that the implementation cannot interact with shared state by accident, especially deep within shared libraries that are perfectly safe to use in a single-threaded context. The wiki page on [Modelling External State Correctly](https://github.com/dafny-lang/dafny/wiki/Modeling-External-State-Correctly) would be updated to recommend that methods exposed to concurrent native code should include `reads {}` and omit any `modifies` clauses.

The nature of frame specifications in Dafny makes `reads {}` less restrictive than it might appear. First of all, it only specifies that a method reads no *mutable* fields. `const` declarations on classes are still fair game. It also only forbids reading *existing* mutable state. It is legal to allocate new mutable objects and modify their fields, either inline or by calling other methods. This means that creating and sharing a cache between multiple calculations within a single externally-exposed method is permitted:

```dafny
method {:extern} FirstNFibonaccis(n: nat) returns (rs: seq<nat>)
reads {}
{
var memoizer := new FibonacciMemoizer();
rs := [];
for i := 0 to n {
var r := memoizer.Get(i);
rs := rs + [r];
}
}
```

When sharing state externally really is necessary, it must be implemented externally in native code, where native facilities for synchronizing access to mutable state can be used. Therefore the solution is to push the caching logic into the target language instead:

```dafny
//
// Wrapper around a concurrent data structure, such as
// ConcurrentHashMap<K, V> in Java.
//
// Note the lack of a ghost variable to track the state of the map,
// which intentionally puts the mutable state outside of the Dafny-verified model.
// These methods instead appear to have non-deterministic behavior to Dafny, which accurately
// represents the fact that it cannot assume anything about state that might be changed by
// other threads.
//
class {:extern} ExternalMutableMap<K, V> {
constructor {:extern} () reads {}
method {:extern} Put(k: K, v: V) reads {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd expect a nonempty modifies clause for Put. Add modifies this.

This also means that FibonacciMemoizer.Get on L139 needs a modifies clause. (Without the usual Repr/Valid() abstraction, this can be done with modifies cache. With this change, it would then be fair to clients to add the postcondition ensures fresh(cache) to the FibonacciMemoizer constructor.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Believe it or not, no! The entire point here is to NOT model the external state in Dafny frames at all. To Dafny, an ExternalMutableMap is a completely opaque object, and its methods have non-deterministic behavior. This means this has no mutable state and adding modifies this will just add confusion and probably more verification difficulties.

method {:extern} Get(k: V) returns (v: Option<V>) reads {}
}

// Necessary to support the concurrency-safe signature of Get above,
// since Dafny can no longer track whether the map contains a key or not,
// so Get must support the case where the key is not in the map.
datatype Option<T> = Some(value: T) | None

class FibonacciMemoizer {
const cache: ExternalMutableMap<nat, nat>
constructor() reads {} {
cache := new ExternalMutableMap();
}
method Get(n: nat) returns (r: nat)
reads {}
{
if n < 2 {
return 1;
}
var cached := cache.Get(n);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache is implicit for this.cache, so it reads this, right? Then the call in SafeFibonacci would not compile.

Or are you saying the reads this is not necessary because cache is const? But I would think we don't know whether ExternalMutableMap is mutable or not, so even if the field is const, that doesn't mean reading it is thread-safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, reads and modifies only care about mutable fields, so vars are in scope but not consts. I agree that's a bit surprising but already how reads works on functions.

Sure, we don't "know" if ExternalMutableMap is mutable since the verifier has no visibility into the external code. But there's no way around that, and modelling the assumptions we might make about external code using ghost state as in https://github.com/dafny-lang/dafny/wiki/Modeling-External-State-Correctly breaks down if that external state can change without Dafny being aware of it. The idea here is to verify that a method does not interact with any mutable state on the Dafny heap, but may still interact with "the outside world", and it's up to the external code to be thread-safe when necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if ExternalMutableMap was not implemented externally, how would you know that it's immutable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this offline to understand the question better, and I can clarify how this all works with a few more details.

Note that clauses such as reads and modifies are not recursive. Take these classes as an example:

class A {
  constructor() {}
  var foo: int
  var b: B
}
class B {
  var bar: int
}

If you had an object a: A, the clause reads {a} would only allow you to reference a.foo, but NOT a.b.foo. For the latter to be allowed, you would need reads {a, a.b}. This is why the Valid() / Repr idiom described in Chapters 8 and 9 of Using Dafny, an Automatic Program Verifier is useful for tracking what areas of the heap your code needs to read and write, without depending on the internals of your components.

That means Dafny doesn't allow the cache.Get call because it believes ExternalMutableMap is immutable, it's allowed because the cache.Get method has reads {}, which means the method does not read any state in the Dafny heap.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I presume that reads clauses are subject to themselves. That is, a reads clause has to be "closed", in the same way that they are for functions. Their well-formedness should then also be allowed to assume the method precondition, just as for functions. (The Dafny verifier has an encoding for this kind of thing. Look for the parameter called delayedReadsChecks or something like that.)

if cached.Some? {
r := cached.value;
} else {
var oneBefore := Get(n - 1);
var twoBefore := Get(n - 2);
r := oneBefore + twoBefore;
cache.Put(n, r);
}
}
}

method {:extern} Fibonacci(n: nat, memoizer: FibonacciMemoizer) returns (r: nat)
reads {}
{
r := memoizer.Get(n); // This version verifies
}
```

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

Implementing the translation of `reads` clauses on methods to Boogie should be relatively straightforward, and reuse most of the translation of `reads` on functions and `modifies` on methods. The main change will be to split the `$_Frame` Boogie variable used by the `ExpressionTranslator` into two different variables named `$_ReadsFrame` and `$_ModifiesFrame`, so that they can both be defined and referenced for a single method body at the same time. This will have the side-effect of slightly increasing the readability of the translation of frame verification.

We also need to ensure that each Dafny compiler produces "thread-safe" code when given code that is verified to not depend on shared mutable state, whenever that is a meaningful requirement for the target language. This is technically independent from the feature proposed here, but is necessary for users to infer that it can be used into order to produce thread-safe code, and is not an explicitly documented property of the compilers and runtimes nor are there any mechanisms in the regression test suite to ensure it. The main likely source of race conditions is in optimizations of data types in the runtimes that are semantically immutable but contain mutable internal state, such as the lazy concatenated sequence resolution in the [C#](https://github.com/dafny-lang/dafny/blob/90eb1084d9862384accd03fb5a1c8af500212a05/Source/DafnyRuntime/DafnyRuntime.cs#L1111-L1120) and [Java](https://github.com/dafny-lang/dafny/blob/90eb1084d9862384accd03fb5a1c8af500212a05/Source/DafnyRuntime/DafnyRuntimeJava/src/main/java/dafny/DafnySequence.java#L756-L761) runtimes. Avoiding such bugs will involve applying industry-standard tooling for analyzing and testing for race conditions, such as [Coyote](https://microsoft.github.io/coyote/) for C# code.

One final important case is functions-by-method: currently it is harmless for the `by method` body to read state that the
`function` body does not, because as long as the `by method` body is proven equivalent to the `function` body,
using the `by method` body instead of the `function` body at runtime is sound.
This is no longer true if the `by method` body reads mutable state that the `function` body does not,
so it is necessary to check the `by method` body against the `reads` frame of the `function` body.
Consider the following contrived example, in which the function specification implicitly includes `reads {}`
and hence should be safe to execute concurrently,
but the `by method` body is no longer guaranteed to produce the same value as the `function` body.

```dafny
function Always42(b: Box): int {
42
} by method {
var result := 42;
result := result + b.x; // Error: insufficient reads clause to invoke method
result := result - b.x; // Error: insufficient reads clause to invoke method
return result;
}
```

# Drawbacks
[drawbacks]: #drawbacks

The biggest potential drawback is that inexperienced Dafny programmers might have the incorrect impression that defining a tight `reads` clause for methods is necessary, and hence expend unnecessary effort for no benefit. The verifier would never prompt users to add `reads` clauses unless they already appear elsewhere in a codebase, so the issue is more that they could see them used in documentation or examples and not realize they are largely optional.

Reusable Dafny code such as the modules in https://github.com/dafny-lang/libraries will need to specify `reads` clauses in order to support consumers that need to specify them. This does not appear to be a significant burden: at the time of writing this the `libraries` repo only contained one `method`. Also note that adding a `reads` clause to a method that does not already have one will never break existing callers of that method. Again, the issue is more that users could imitate standard library `method` definitions and include superfluous `reads` clauses. The answer is likely to document explicitly why the standard library uses this feature and to emphasize that it should not be copied blindly. There is already a similar issue with the standard library's use of `{:trigger}` attributes, which are not encouraged in most Dafny codebases.

There is a chance that if and when explicit concurrency support is added to Dafny, the much more limited solution of verifying a method body `reads` and `modifies` nothing may become unused. This would make supporting `reads` clauses on methods harmless but also a maintenance burden. On the other hand, the feature is a generic and independent concept with relatively low implementation complexity, and the extra information provided by `reads` clauses is potentially useful for verifier and compiler optimizations in the future. There is a reasonable chance that `reads` clauses will be part of the future solution for concurrency anyway.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

This proposal is intended to be a low-cost solution to thread-safety for Dafny. The main alternatives are as follows.

## Accept and document that compiled Dafny is not thread-safe

We could make the intentional decision that Dafny currently does not support this use case. This would mean users that wish to use Dafny in possibly-concurrent environments have three options:

1. Document that their own cross-compiled Dafny code is not thread-safe, and rely on downstream clients to avoid sharing objects.
2. Wrap all invocations of compiled Dafny code with a common lock, in the style of a Global Interpreter Lock. This would be safer, but would also penalize single-threaded use cases with an unnecessary synchronization overhead.
3. Ensure their compiled code is thread-safe by manual review and testing. This would be error-prone and especially unsatisfying given the static guarantees Dafny otherwise provides. It would also force end users to validate that the Dafny runtime implementations are thread-safe, which is not currently guaranteed, and even if currently true could regress in future versions of Dafny.

## Add direct support for concurrency to Dafny

If there is intentional use of shared mutable state in Dafny code that needs to be externally thread-safe, currently the only solution is to move this logic into external code and make use of synchronization features in the target language(s). This is unfortunate when a single `synchronized` block would hypothetically address a tiny race condition.

Besides the fact that adding full concurrent support to Dafny will be a large and expensive undertaking, we need to be very careful about what paradigm we choose. Implementing `synchronized` blocks would imply that Dafny uses a shared memory model, but it's very possible something like message passing would make specifications easier to write and less costly to verify. Selecting or designing a paradigm for Dafny is particularly challenging given it has to be compilable to multiple target languages: a shared memory model compiles naturally to C# or Java, but much less so to Go.

# Prior art
[prior-art]: #prior-art

The most obvious prior art for this is the existing support in Dafny itself for `reads` on function definitions and `modifies` on methods.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

- Parsing is simplest if `reads` clauses are allowed on all `method`-like definitions, including the various flavours of lemmas. Allowing `reads` clauses on lemmas seems harmless, but are there downsides?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser already disallows modifies clauses on lemma declarations. The parser could do something similar for reads.

- Are there other potential use cases for this feature worth mentioning, aside from external thread-safety?
- Is there a good term for a `method` that does not read or modify any shared state? "Pure" is not a good choice in this context, since it usually implies a procedure that always produces the same value when run on the same input (i.e. a `function` in Dafny) and that has no side effects (which a `method` could still have through calls to external code).

# Future possibilities
[future-possibilities]: #future-possibilities

It could be useful to have a utility to infer and add `reads` clauses to all the methods declared in a Dafny codebase, to reduce the burden of ensuring existing code is externally thread-safe.

Ideas for how Dafny should tackle asynchronous or concurrent execution in general are out of scope for this particular RFC, but still worth discussing in case they influence any immediate decisions in the name of forwards-compatibility.