diff --git a/files/0011-expressive-diagnostics-cli.png b/files/0011-expressive-diagnostics-cli.png new file mode 100644 index 0000000..b5b4dba Binary files /dev/null and b/files/0011-expressive-diagnostics-cli.png differ diff --git a/files/0011-expressive-diagnostics-vscode-full-diagnostic.png b/files/0011-expressive-diagnostics-vscode-full-diagnostic.png new file mode 100644 index 0000000..67d4c65 Binary files /dev/null and b/files/0011-expressive-diagnostics-vscode-full-diagnostic.png differ diff --git a/files/0011-expressive-diagnostics-vscode-hints.png b/files/0011-expressive-diagnostics-vscode-hints.png new file mode 100644 index 0000000..e8e3042 Binary files /dev/null and b/files/0011-expressive-diagnostics-vscode-hints.png differ diff --git a/files/0011-expressive-diagnostics-vscode-problems.png b/files/0011-expressive-diagnostics-vscode-problems.png new file mode 100644 index 0000000..15a53c2 Binary files /dev/null and b/files/0011-expressive-diagnostics-vscode-problems.png differ diff --git a/rfcs/0006-configuration-constants.md b/rfcs/0006-configuration-constants.md new file mode 100644 index 0000000..432c410 --- /dev/null +++ b/rfcs/0006-configuration-constants.md @@ -0,0 +1,149 @@ +- Feature Name: Configuration Time Constants, `config_time_constants` +- Start Date: 2022-10-01 +- RFC PR: [FuelLabs/sway-rfcs#0006](https://github.com/FuelLabs/sway-rfcs/pull/19) +- Sway Issue: [FueLabs/sway#1498](https://github.com/FuelLabs/sway/issues/1498) + +# Summary + +[summary]: #summary + +Configuration time constants can be conceptualized as traditional environment variables. Some bytecode has been compiled by the Sway compiler, and the SDK would like to configure some behavior of that bytecode with additional inputs that won't trigger a recompile. + +There is a similar feature that was implemented [here][pr_2549], but that was a mistaken interpretation of the requirements. In [#2549][pr_2549], a recompile still occurs and the new values are injected via `Forc.toml`, when we actually want these values to be injectable by the SDK. + +[pr_2549]: https://github.com/FuelLabs/sway/pull/2549 + +# Motivation + +[motivation]: #motivation + +The primary and only motivation that requires this feature is contract factories. To construct a set of trusted contracts from some template, we need to have a mapping of which parts of the bytecode are just configuration variables and which parts are application logic. By comparing the version with zeroed-out constants we can assert that two contracts are the same, modulo their configuration time constants. + +# Guide-level explanation + +[guide-level-explanation]: #guide-level-explanation + +When a Sway program is compiled, a set of artifacts are produced. Included in these artifacts are a [JSON ABI descriptor](https://fuellabs.github.io/fuel-specs/master/protocol/abi/json_abi_format.html#json-abi-spec) containing a configuration-time-constant descriptor, and the actual bytecode. The object we are discussing here is the configuration-time-constant descriptor, which describes the specific offset within the bytecode, in terms of bytes, to the data section entry for a particular configuration variable. + +The configuration variables for a program are declared in a `configurable` block similarly to `storage` variables. Each configurable variable requires a type annotation as well as an initializer. The value of each configurable variable can be overriden at configuration time by the SDK, `forc`, or some other consumer of the configuration-time-constant descriptor. As an example: + +```rust +// main.sw +script; + +configurable { + CONTRACT_ADDRESS: b256 = 0x0000000000000000000000000000000000000000000000000000000000000000, + PRICE_RATIO: u64 = 0, +} + +fn main() -> (u64, b256) { + (PRICE_RATIO, CONTRACT_ADDRESS) +} +``` +```json +// project-abi.json +{ + ... + "configurables": [ + { + "configurableType": { + "name": "", + "type": 3, + "typeArguments": null + }, + "name": "CONTRACT_ADDRESS", + "offset": 3260 + }, + ... + ], + ... +``` + +As `configurable` values will affect a contract's ID, `forc` users must be able to specify these values for each of their contract dependencies. To enable this, `forc` will allow users to specify a `config/.sw` file for each contract dependency that contains configurable values. This Sway file must export the required constants as specified by the contract dependency's `configurable` block. These files will be compiled independently for each contract dependency, and the resulting `configurable` bytecode section for each dependency will be used to replace its default `configurable` bytecode section. + +# Reference-level explanation + +[reference-level-explanation]: #reference-level-explanation + +The `type` field of the JSON file should describe the type of the configurable in the same manner as types are described in the JSON ABI file. + +A single `configurable` block is allowed in a program and no two configurable variables in a program should have the same name to prevent collisions. + +The range from the offset to the offset plus the size of the type will represent bytes within the bytecode that will be overwritten by the SDK (or `forc`), which will write values according to the ABI encoder's memory layout, which is consistent with Sway's memory layout. + +The change is not breaking as it is entirely new functionality. + +Configuration time constants may not be defined in any library code and must be defined at the top level. Should a library need to reference a configurable value, it should utilize design patterns such as the [builder pattern](https://en.wikipedia.org/wiki/Builder_pattern) to ingest the constants from the top level. + +## Failure to provide constants + +If the constants are not provided, the SDK and/or `forc` should throw an error. Both tools have knowledge of the required constants due to the descriptor file, so they may check for the constants' presence and report a descriptive error if any are missing. + +## Interactions with optimization passes + +Configuration time constants are not allowed to be optimized away or constant folded or really touched in any way (for example, a config struct cannot be broken into into its individual elements). They always need an actual spot in the bytecode. So, the optimizor has take that into account. This can be accomplished with a "configurable" section that is entirely different from the data section and that can be left untouched by the rest of the compilation process. That being said, configurable variables that have no uses in a Sway program should be omitted from both the bytecode and the JSON ABI descriptor. + +## Allowed Types + +Any type that can be handled by the ABI encoder and decoder can be passed along as a config-time constant. + +## Memory layout + +The memory layout of configurable variables is identical to the memory layout of `const` variables which also matches the layout described in the [Argument Encoding](https://fuellabs.github.io/fuel-specs/master/protocol/abi/argument_encoding.html) section of the ABI spec. + +## Specifying configuration-time constants for contract-dependencies in `Forc.toml` + +With the recent introduction of [`[contract-dependencies]`](https://github.com/FuelLabs/sway/pull/3016), it would be necessary to allow `forc` projects to specify the configuration constants declared within each of their contract-dependencies. This is important in allowing the user to refer to a specific contract, and to make sure we are constructing the correct contract ID constants during build. + +With configuration time constants, declaring `[contract-dependencies]` in `Forc.toml` might look something like: + +```toml +[contract-dependencies] +foo = { git = "https://domain.com/owner/repo", config = "config/foo.sw" } +``` + +where `config/foo.sw` has a single `configurable` block with each constant value specified. + +Alternatively, we could remove the need to specify the `config` file location manually by making the `config/` directory a standard location. We could require that, for each contract-dependency that has a `configurable` block, a file exists in the `config/` directory that matches the name of the dependency with `.sw` appended (e.g. `config/foo.sw`). In the case that no `config/foo.sw` file is present, we could either: + +1. produce an error, or +2. fallback to using the contract dependency's default values, indicating that we are falling back with a `info!` log or similar. + +# Drawbacks + +[drawbacks]: #drawbacks + +One major drawback is the additional cognitive complexity this adds to the compilation process. If any bug is introduced or the overwriting of the bytecode goes awry in any way, it will result in confusing and inconsistent undefined behavior. We will need to introduce sufficient checks to ensure that an end user would never encounter this situation. An example of testing this behavior would be integration tests that pass in the majority of the ABI-encodable types, fuzzed, as config time constants, and assessing behavior. + +This increases our reliance on the correctness of the ABI encoder in the SDK, and any version mismatches or bugs in the encoder will result in similarly undefined and unpredictable behavior. + +This change also further couples the language to the SDK, further minimizing the use case of writing Sway without the SDK. This can be alleviated by introducing mechanisms to `forc` for defining these values manually, as mentioned in the guide-level explanation. + +# Rationale and alternatives + +[rationale-and-alternatives]: #rationale-and-alternatives + +One alternative is to introduce an environment variables/configuration-time constants section to the transaction format. This could be cleaner, but the impact on Sway would be minimal. Sway would still output a descriptor file, introduce the `configurable` keyword syntax, etc., and the majority of this RFC would still apply. The benefit of adding this to the transaction format would be potential simplification of the process. A couple negatives would be additional churn on the client, VM, and SDK; and further coupling of specifically Sway to the FuelVM. + +If we do not implement configuration-time constants, the workflow of deploying contracts and then calling them from scripts (i.e. the main use case of Sway) would still require manually updating contract addresses in the source code. Without an enshrined tool for configuring values like this, it is likely that some third party `sed`-like workaround would develop in the community to the detriment of the Fuel stack developer experience. + + +# Prior art + +[prior-art]: #prior-art + +[Immutables in Solidity](https://docs.soliditylang.org/en/v0.6.5/contracts.html) are the closest prior art, effectively performing the exact same functionality. Environment variables in traditional software development are also similar in paradigm and utility. + +# Unresolved questions + +[unresolved-questions]: #unresolved-questions + +This change has been discussed in depth among SDK, Sway, and client developers and there is little ambiguity. There could be further iterations on minor details like the specific syntax in the language and the format of the descriptor file. + +# Future possibilities + +[future-possibilities]: #future-possibilities + +Someday, if possible without the SDK importing the compiler, perhaps some constant evaluation of more sophisticated expressions could be done in the SDK, although that's a minor use case. + +It is also possible that configuration time constants are added to the transaction format someday. If that happens, the compiler will not need to change dramatically but the client and SDK will have work to do. diff --git a/rfcs/0009-storage-handler.md b/rfcs/0009-storage-handler.md new file mode 100644 index 0000000..2bfd8d4 --- /dev/null +++ b/rfcs/0009-storage-handler.md @@ -0,0 +1,401 @@ +- Feature Name: Storage Keys +- Start Date: 2023-02-10) +- RFC PR: [FuelLabs/sway-rfcs#0023](https://github.com/FuelLabs/sway-rfcs/pull/023) +- Sway Issue: [FueLabs/sway#3043](https://github.com/FuelLabs/sway/issues/3043) + +# Summary + +[summary]: #summary + +This RFC introduces the concept of a `StorageKey` which describes a particular storage slot via its key. The RFC also reworks how storage accesses are represented in the language and how `StorageKey` can be used to build dynamic storage types, such as `StorageMap` and `StorageVec`, more robustly. + +# Motivation + +[motivation]: #motivation + +The current approach for reading and writing storage slots/variables has multiple flaws, despite being quite ergonomic: + +1. There is currently no way to reason about the location of a particular storage variable, which makes passing "references" to storage variables impossible. One workaround is manipulating and reading storage slots manually via `std::storage::store` and `std::storage::get`, but this approach is quite unsafe. Alternatively, one could declare a `trait` in a library and implement it in a contract and have its methods access storage variables. Those methods are then used in the library to access storage indirectly. This does not solve the full problem and feels more like a workaround than a real solution. +1. Dynamic storage types, such as `StorageMap` and `StorageVec`, currently require a "hacky" compiler intrinsic called `__get_storage_key` which is not very well defined and is hard to use. Its implementation also requires that all methods using it are inlined, which is not something that can be guaranteed, even if `#[inline(always)]` is used. Introducing the concept of a `StorageKey` will help us completely remove this intrinsic. +1. The current implementation of dynamic storage types prevents them from being used as struct fields or as type parameters of other storage types (e.g. `StorageVec>`). + +# Guide-level explanation + +[guide-level-explanation]: #guide-level-explanation + +A `StrorageKey` is a thin wrapper type, in the standard library, around a `b256` key that describes a storage slot and a `u64` offset, in 64 bit words, from the beginning of that slot. + +```rust +pub struct StorageKey { + key: b256, + offset: u64, +} +``` + +The type `T` describes the type of the data that the `StorageKey` points to. Since each storage slot is 64 bytes long, some types cannot fit in a single storage slot, particuarly if `offset` is non-zero. In that case, the data is stored across multiple consecutive storage slots starting at `key + offset`. More on this in [Storage Slot Assignment](#storage-slot-assignment). + +> **Note** +> We use the `+` sign in `key + offset` throughout this document to refer to the location in storage that `StorageKey { key, offset}` points to. For example, if `key = 0x0...00` and `offset = 10`, then `key + offset` points to the second word in storage slot `0x0...02` because `offset` is in words and each storage slot contains 4 words. + +Reading and writing the data stored at `key + offset` can be accomplished using the `read`, `try_read`, and `write` methods below: + +```rust +impl StorageKey { + #[storage(read)] + pub fn read(self) -> T { + std::storage::read::(self.key, self.offset).unwrap() + } + + #[storage(read)] + pub fn try_read(self) -> Option { + std::storage::read::(self.key, self.offset) + } + + #[storage(write)] + pub fn write(self, other: T) { + std::storage::write(self.key, self.offset, value); + } +} +``` + +This assumes the existence of functions `std::storage::read` and `std::storage::write` that read and write the appropriate storage slots given a key and an offset and handle all the appropriate storage slot assignment and data conversion. + +Notice that `StorageKey::try_read` returns an `Option` because it is possible that a particular storage slot is not valid, i.e. has not been written before, in which case `None` is returned. If the type `T` spans multiple storage slots, then the method `StorageKey::try_read` returns `None` if at least one the storage slots that `T` spans is invalid. The method `StorageKey::read` is similar to `StorageKey::read` except that it unwraps the returned `Option` internally. + +With the `StorageKey` type available, this RFC proposes a redefinition of the meaning of the expression `storage.....` to return a `StorageKey` "pointing" to the actual data instead of returning the data itself. The RFC also proposes removing the "reassignment" statement `storage.... = ..` from the language, at least temporarily. + +Below is an example showing the behavior of `StorageKey` and the new behavior of storage access expressions: + +```rust +struct MyStruct { + x: u64, + y: b256, +} + +storage { + s: MyStruct = MyStruct { x: 0, y: ZERO_B256 }, +} + +fn foo() { + // Get the storage key to the struct field `x` directly and then use it to store `42` in `x`. + let x_key: StorageKey = storage.s.x; + x_key.write(42); + + // Call `write` and `try_read` directly on `storage.s.y` + storage.s.y.write(ZERO_B256); + let y: Option = storage.s.y.try_read(); + + // Get the storage key to the struct `s` and then use it to store `MyStruct { x: 42, y: ZERO_B256 }` in `s`. + let s_key: StorageKey = storage.s; + s_key.write(MyStruct { x: 42, y: ZERO_B256 } ); + + // Read the struct `s` + let s: MyStruct = s_key.read(); +} +``` + +Below is how the "counter" example now looks like: + +```rust +contract; + +abi TestContract { + #[storage(write)] + fn initialize_counter(value: u64) -> u64; + + #[storage(read, write)] + fn increment_counter(amount: u64) -> u64; +} + +storage { + counter: u64 = 0, +} + +impl TestContract for Contract { + #[storage(write)] + fn initialize_counter(value: u64) -> u64 { + // Previoulsy: `storage.counter = value` + storage.counter.write(value); + + value + } + + #[storage(read, write)] + fn increment_counter(amount: u64) -> u64 { + // Previously: `let incremented = storage.counter + amount` + let incremented = storage.counter.read() + amount; + + // Previously: `storage.counter = incremented` + storage.counter.write(incremented); + + incremented + } +} +``` + +> **Note** +> Because storage variables defined in a `storage` block have to be initialized, it is generally often to call `StorageKey::read` to get back the data directly instead of having to handle the `Option` returned from `try_read`. This, of course, becomes unsafe if `asm` blocks that clear storage slots are used. Note that the standard library currently has a public method `clear` that we also propose that it should be made private to avoid potential foot guns. + +# Reference-level explanation + +[reference-level-explanation]: #reference-level-explanation + +The compiler will handle storage slots assignment for each storage variable. Each storage access expression will simply return a `StorageKey` containing the key and the offset chosen by the compiler. + +## Storage Slot Assignment + +Each storage variable in a `storage` block is assigned an index corresponding to its location in the list of variables: + +```rust +struct MyInnerStruct { + c: u64, + d: b256, +} + +struct MyStruct { + a: u64, + b: MyInnerStruct, +} + +storage { + x: u64 = 0, // Assigned index `0` + y: b256 = ZERO_B256, // Assigned index `1` + str: str[99], // Assigned index '2' + s: MyStruct = MyStruct { a: 0, b: MyInnerStruct { c: 0, d: ZERO_B256 } }, // Assigned index `3` +} +``` + +The "index" assigned to each variable is used to generated a key as follows. + +### "Small" Types + +Small types are types that fit in a single storage slot. Storage variables that have small types are packed and assigned a single key that is equal to `sha256("storage_")` where `` is the index assigned to the storage variable. In the example above, the key chosen for `x` is: + +```rust +sha256("storage_0") = 0xf383b0ce51358be57daa3b725fe44acdb2d880604e367199080b4379c41bb6ed +``` + +and the key chosen for `y` is: + +```rust +sha256("storage_1") = 0xde9090cb50e71c2588c773487d1da7066d0c719849a7e58dc8b6397a25c567c0 +``` + +### "Large" Types + +Large types span multiple storage slots and are packed and laid out sequentially in storage starting at key `sha256("storage_")` where `` is the index assigned to each storage variable. The number of storage slots required can be computed as `(__size_of::() + 31) >> 5` (ceiling of the size of the data type divided by `32`). + +For example, the storage variable `str` above is a string containing `99` characters and its size is `99` bytes which span 2 storage slots. It is also assigned index `2`. Therefore, `str` spans slots with keys `sha256("storage_2")` and `sha256("storage_2") + 1`. + +Variable `s` is a nested struct and is assigned index `3`. Therefore, its fields and subfields are stored as follows: + +- `s.a` is stored as the first word in storage slot with key `sha256("storage_3")`. +- `s.b.c` is stored as the second word in storage slot with key `sha256("storage_3")` (same slot as `s.a`). +- `s.b.d` is the third and fourth words in storage slot with key `sha256("storage_3")` as well as the first and second words in storage slot with key `sha256("storage_3") + 1`. + +### Empty Types + +There are two possible empty types: structs with no fields and enums with no variants. Empty structs are useful for implementing dynamic storage types such as `StorageMap` and `StorageVec`. Storage variables that have empty types do not consume any storage slots. + +### Behavior of `StorageKey` + +When a storage variable is accessed, a `StorageKey` is returned of the appropriate type. The `StorageKey` object contains the `b256` key and the `u64` offset required to access the underlying data: + +- `storage.x` returns `StorageKey { key: sha256("storage_0"), offset: 0 }`. +- `storage.y` returns `StorageKey { key: sha256("storage_1"), offset: 0 }`. +- `storage.str` returns `StorageKey { key: sha256("storage_2"), offset: 0 }`. +- `storage.s` returns `StorageKey { key: sha256("storage_3"), offset: 0 }`. +- `storage.s.a` returns `StorageKey { key: sha256("storage_3"), offset: 0 }`. +- `storage.s.b` returns `StorageKey { key: sha256("storage_3"), offset: 1 }`. +- `storage.s.b.c` returns `StorageKey { key: sha256("storage_3"), offset: 1 }`. +- `storage.s.b.d` returns `StorageKey { key: sha256("storage_3"), offset: 2 }`. + +## Implementation Detail + +The typed expression `TyStorageAccess` in the compiler should now return `std::storage::StorageKey` which should store the key and offset chosen by the compiler according to the rules described in [Storage Slots Assignment](#storage-slot-assignment). The rest of the flow will be handled automatically after removing all the unnecessary logic in the compiler for reading and writing storage slots. + +## Implementing Dynamic Storage Types using `StorageKey`. + +Dynamic storage types such `StorageMap` and `StorageVec` currently use the intrinsic `__get_storage_key` which is not very well defined and has a fragile implementation. With `StorageKey` available and returned directly from a storage access expression (as in `storage.my_map`), we can re-implement methods like `StorageMap::insert`, `StorageMap::get`, and `StorageMap::remove` in a safer and cleaner way as follows: + +```rust +/// A persistent key-value pair mapping struct. +pub struct StorageMap {} + +impl StorageKey> { + #[storage(read, write)] + pub fn insert(self, key: K, value: V) { + let key = sha256((key, self.key)); + write::(key, 0, value); + } + + #[storage(read)] + pub fn get(self, key: K) -> StorageKey { + StorageKey { + key: sha256((key, self.key)), + offset: 0, + } + } + + #[storage(write)] + pub fn remove(self, key: K) -> bool { + let key = sha256((key, self.key)); + clear::(key) + } +} +``` + +With the above new implementation, we should be able to continue to use `StorageMap` as before: + +```rust +storage { + my_map: StorageMap = StorageMap {} +} + +storage.my_map.insert(0, 0); +let x = storage.my_map.get(0).read(); +storage.my_map.remove(0); +``` + +> **Note** +> The method `get` now returns a `StorageKey` instead of the actual data because this simplifies nesting storage maps and other dynamic storage types. + +A similar approach can be followed to re-implement `StorageVec` and other dynamic storage types. + +## Dynamic Storage Types in Structs + +The code below will now be valid after implementing all of the above: + +```rust +MyStruct { + map1: StorageMap, + map2: StorageMap, +} + +storage { + s: Mystruct = MyStruct { + map1: StorageMap { }, + map2: StorageMap { }, + } +} + +storage.s.map1.insert(0, 0); +let x = storage.s.map2.get(0).read(); +``` + +It will also be possible to pass a `StorageMap` to a function by passing a `StorageKey` to it as follows: + +```rust +storage { + my_map: StorageMap = StorageMap {} +} + +fn foo(map: StorageKey>) { + map.insert(0, 0); +} + +fn bar() { + foo(storage.my_map); +} +``` + +## Nested Dynamic Storage Types + +Nesting dynamic storage type is now trivial given what we have so far: + +```rust +storage { + nested_map_1: StorageMap>> = StorageMap {}, +} + +storage.nested_map_1.get(0).get(0).insert(0, 1); +storage.nested_map_1.get(1).get(1).insert(1, 8); +assert(storage.nested_map_1.get(0).get(0).get(0).read() == 1); +assert(storage.nested_map_1.get(1).get(1).get(1).read() == 8); +``` + +# Drawbacks + +[drawbacks]: #drawbacks + +There are two main drawbacks for the approach proposed above: + +1. Reading and writing a storage variable is now a bit more complicated and less ergonomic than before because we now have to call `read` and `write` manually. We should consider making this more ergonomic by introducing a simpler syntax that de-sugars to the API calls above. For example, we could de-sugar something like `storage.x = 42` to `storage.x.write(42)`. + +2. The second drawback is that, because sturcts are packed in storage, then we will often need to read a storage slot before writing to it because we may need to preserve parts of it if the value we're writing is smaller than the slot itself. This means that almost each storage write now also requires a storage read. This is the case in Solidity as well. The upside here is that a smaller number of storage slots will be needed overall. + +# Rationale and alternatives + +[rationale-and-alternatives]: #rationale-and-alternatives + +One alternative is to continue with the current use model which has many flaws so that's not ideal. Another alternative is to introduce an `AsStorageKey` trait that allows calling `storage.x.get_storage_key()` to get the storage key manually, but that approach has its limitations when implementing `StorageMap` and `StorageVec` and requires defining the behavior of `get_storage_key` when called on a regular stack variable. + +One more alternative is to completely remove the concept of a "storage variable" and switch over to thinking about storage like file I/O or databases. However, I think the approach described in this RFC provides a good middle ground where methods like `write` and `read` somewhat behave like I/O methods but also keeps the abstraction of "storage variables" to help user write readable code. + +# Prior art + +[prior-art]: #prior-art + +Solidity's approach to storage is completely different from the above and has its own problems, as storage variable are hard to identify and storage accesses are not explicit enough. Sway's current approach is also flawed as explained in the [Motivation](#motivation) section. + +In Rust, the closest thing to storage accesses is file I/O where a handler is used to read and write to a file similarly to `write` and `read` above. + +# Unresolved questions + +[unresolved-questions]: #unresolved-questions + +- How do we make storage accesses more ergonomic? Or at least as ergonomic as they currently are? + +# Future possibilities + +[future-possibilities]: #future-possibilities + +## Unpacked storage structs. + +Unpacking structs in storage could be useful in certain situations. The advantage of unpacking sturcts is that each field and subfield gets its own storage slot which often makes reading and writing those fields and subfields cheaper. We may want to consider an annotation in the future that requests that a given struct is unpacked in storage. + +## `Index` and `IndexAssign` traits + +We can introduce the traits `Index` and `IndexAssign` which would allow implementing `index()` and `index_assign()` for dynamic storage types such as `StorageMap` and `StorageVec` as follows: + +```rust +impl Index for StorageHandle> + fn index(self, key: K) -> StorageKey { + StorageKey { + key: sha256((key, self.key)), + offset: 0, + } + } +} + +impl IndexAssign for StorageHandle> + #[storage(read, write)] + pub fn index_assign(self, key: K, value: V) { + let key = sha256((key, self.key)); + write::(key, 0, value); + } +} +``` + +Note that this deviates from Rust which has `Index` and `IndexMut`, but `IndexMut` likely requires mutable references in the language which we don't have. However, there has been [attempts](https://github.com/rust-lang/rfcs/pull/1129) to introduce `IndexAssign` in Rust but without any success due to reasons that I don't think apply to Sway. + +With the above, the compiler can then de-sugar expressions like `a[i]` to `a.index(i)` and `a[i] = b` to `a.index_assign(i, b)`. For a nested `StorageMap`, the resulting user code would look like: + +```rust +struct M { + u: b256, + v: u64, +} + +storage { + nested_map_2: StorageMap<(u64, u64), StorageMap>> = StorageMap {}, +} + +let m1 = M { + u: 0x1111111111111111111111111111111111111111111111111111111111111111, + v: 1, +}; + +storage.nested_map_2[(1, 0)]["0000"][0] = m1; +assert(storage.nested_map_2[(1, 0)]["0000"][0].read() == m1); +``` diff --git a/rfcs/0010-references.md b/rfcs/0010-references.md new file mode 100644 index 0000000..657758c --- /dev/null +++ b/rfcs/0010-references.md @@ -0,0 +1,417 @@ +- Feature Name: `references` +- Start Date: 2023-08-17 +- RFC PR: [FuelLabs/sway-rfcs#28](https://github.com/FuelLabs/sway-rfcs/issues/28) +- Sway Issue: [FueLabs/sway#5063](https://github.com/FuelLabs/sway/issues/5063) + +# Summary + +[summary]: #summary + +This RFC aims to make the paradigm for handling reference types clear, fully typed and to +eliminate redundant concepts. + +# Motivation + +[motivation]: #motivation + +There currently exists a kludge of features on top of pointers that have +various levels of type correctness, offer various paradigms and have less than +coherent syntax. + +We want to clarify the rules for passing dynamic data to and from functions, how +they are expressed in the type system and through syntax, and eliminate untyped +values such as `raw_ptr` and `raw_slice`. + +# Guide-level explanation + +[guide-level-explanation]: #guide-level-explanation + +Unlike Rust, Sway does not manage lifetimes. Due to the nature of smart contract +execution, dynamic memory usage only grows and allocated memory is allocated for +the entire lifetime of the execution. In Rust terms, every reference value has a +lifetime of `'static`. + +This allows us to do away with some complexity and handle values more like they +would be in Java or ML. +In most cases, you shouldn't need to use explicit pointer types. + +## Data types + +In Sway there are two types of data types. Value types and Reference types. + +Value types live for the lifetime of the function they are declared in. They are +passed and returned by value, which means their value is directly copied. Value +types include `u8`, `u16`, `u32`, `u64`, tuples such as `()`, `(u8, u64)` and +pointers such as `*mut u8`. + +Reference types are represented by a pointer to a memory region and can have a +dynamic size. They are passed and returned by reference, which means the pointer +is copied and still points to the same data. + +There are two kinds of references. Slim and fat. + +Slim references are implemented using a single address. Slim references +include structs, enums and arrays, such as `String`, `Option`, `[u8; 42]`. + +Fat references are implemented using an address and a small amount of additional +data, either a length or a pointer to a more complex table. Fat references +include slices, such as `[u8]`, `str`. + +## Box + +It is sometimes desirable to have a reference to a value type, such as for +passing a mutable reference to it to a function. This can be achieved with the +`Box` struct, which is so defined: + +```sway +pub struct Box { + val: T +} +``` + +and produces the desired memory representation. + +## Pointers + +There are cases where one may want to directly manipulate memory addresses with pointer arithmetic, +to allow for this we have a pointer type that represents a single address: `*mut T` where T is the type being pointed to. + +Pointers can be obtained by using the `__addr_of` intrinsic on a reference and dereferenced using the `__deref` intrinsic, like so: + +```sway +let val: u64 = 1; +let ptr: *mut u64 = __addr_of(Box::new(val)); +let ptr_val: u64 = __deref(ptr); +assert_eq(val, ptr_val); + +let val: Box = Box::new(1); +let ptr: *mut Box = __addr_of(Box::new(val)); +let ptr_val: Box = __deref(ptr); +assert_eq(val, ptr_val); + +let val: Box = Box::new(1); +let ptr: *mut u64 = __addr_of(val); +let ptr_val: u64 = __deref(ptr); +assert_eq(val, Box::new(ptr_val)); +``` + +Dereferencing an invalid pointer is Undefined Behavior. + +## Slices + +Slices represent contiguous areas of dynamic memory. +They can be used to represent dynamically sized data. +Slices are represented on the by a pair containing a pointer to the data and a length. + +String slices, of type `str` represent a series of bytes encoding a valid UTF-8 string. +This is the type returned by string literals. + +```sway +let _: str = "Lorem Ipsum"; +```` + +Typed slices, of type `[T]`, represent contiguous series of elements of a type `T`. + +```sway +let _: [u64] = [1, 2, 3].as_slice(); +```` + +Slices can be obtained from arrays and other slices by using the `__slice` +intrinsic. This will produce a smaller slice of the argument type at the +specified indices. This slicing is bounds checked and will produce a revert if +slicing out of bounds. + + +```sway +let array: [u64; 4] = [1, 2, 3, 4]; + +// will produce 1, 2, 3, 4 +let slice: [u64] = __slice(array, 0, 4); + +// will produce 2, 3 +let slice: [u64] = __slice(array, 1, 3); + +// will revert +// let slice: [u64] = __slice(array, 0, 5); +```` + +Elements of slices can be obtained using the `__slice_elem` intrinsic. +This will return the element at the given index. Invalid indices will produce a revert. + +```sway +let slice: [u64] = [1, 2, 3, 4].as_slice(); + +let elem = __slice_elem(slice, 2); +assert_eq(elem, 3); + +// will revert +// let elem = __slice_elem(slice, 4); +``` + +## Passing values + +When values are passed as arguments to a function they are either mutable or +immutable which is denoted by the `mut` prefix in the argument type. + +Mutable value type arguments can be reassigned. + +Mutable reference arguments can both be reassigned and have their fields +assigned to. + +```sway +fn foo( + mut arg1: Box, + arg2: Box, + mut arg3: u32, + arg4: u32 +) { + // arg1 is mutable so this is legal + arg1.val = 0; + + // arg1 is mutable so this is legal, + // but it won't affect the original value the reference pointed to + arg1 = Box { val: 1 }; + + // arg2 is immutable so both or these are illegal + // arg2.val = 0; + // arg2 = Box { val: 1 }; + + // arg3 is mutable so this is legal + arg3 = 0; + + // arg4 is immutable so this is illegal + // arg4 = 0; +} +``` + +## Returning values + +Values returned by functions are mutable by default. + +You can prepend the return type with `const` to make a reference return type +immutable and guarantee that the data pointed to by a returned reference type +will not be mutated. + +```sway +fn foo() -> Box { + Box { val: 0 } +} + +fn bar() -> const Box { + Box { val: 0 } +} + + +pub fn main() { + // both of these are legal because foo returns a mutable value + let _ = foo(); + let mut _ = foo(); + + + // only the first is legal because bar returns an immutable value + let _ = bar(); + // let mut _ = bar(); +} + +``` + +Using `const` on value types is not allowed: + +```sway +// this is illegal +// fn fun() -> const u32 { 0 } +``` + +## Methods + +Methods definitions use `self` to refer to the value the method is being called +on, this follows the same rules defined for functions, except of course `self` +can't be reassigned. + +```sway +struct S { + a: u32, +} + +impl S { + pub fn foo(self) { + // illegal since self is immutable + // self.a = 0; + } + pub fn foo(mut self) { + // legal since self is mutable + self.a = 0; + } +} + +``` + +## Migration + +Existing codebases can adapt to this change by changing every occurrence of `ref +mut` to `mut` and every use of `raw_slice` and `raw_ptr` to uses of `Box`, typed +pointer and slice types. + +In general, switching untyped pointers to a properly typed `Box`, even a generic +`Box`, is preferred. Pointers for whom the pointee isn't a value that can be +named may need to use `Box<()>` or `*mut ()`, the former is preferred. + +# Reference-level explanation + +[reference-level-explanation]: #reference-level-explanation + +## Slices + +Implementing the new slice types should require: + +* parser changes to introduce the new syntax +* adding the new types to the type system (this is already partially done) +* implementing the `__slice` and `__slice_elem` intrinsics + +The slicing intrinsics should be thoroughly tested (specifically on the bounds checking). + +There are considerations to be given to the element sizes. We'll want to finish +current changes to the memory representation of small values so that `[u8]` +represents a continuous series of bytes without padding or provide this capability in other ways. + +One thing to keep in mind is that slices' dynamic size makes them difficult to +deal with for the current ABI and they can't be returned as element of other +dynamic types. This is to be addressed by [a later RFC](https://github.com/FuelLabs/sway-rfcs/issues/29). + +String literals are meant to return `str` and string array types are meant to +be totally replaced by `str`, however due to this ABI issue we'll need to keep +them around until further changes so string data can still be passed through +the ABI. We may want to consider temporarily introducing an intrinsic to convert +a string slice to a string array of a given size (with or without some bounds +checking). + +`raw_slice` backs a lot of our dynamic types but we should be able to replace +most of its uses with a properly represented `[u8]`. + +## References + +Clarifying the reference syntax for function calls should only require a bunch +of small changes to the parser, except for the introduction of `const` return +values. + +The way mutability is currently checked should already be compatible with +returning such immutable values and checking them against the mutability of +their local environment (even for a `let`), however we will need to introduce a +check to make sure that `const` is only used with reference types. + +## Pointers + +`*mut` should provide a fully typed alternative to `raw_ptr`. + +`__addr_of` will have to be altered to return a `*mut` and we'll need to +introduce `__deref`. + +We will not check pointer mutability at this time, hence the `mut` in `*mut`, +however we'll probably want to introduce `*const` once typed pointers are +stabilized. + +Using this instead of `raw_ptr` will require heavy edits of `std` and `core`. + +We may need to support pointer casts without asm block hacks. Introducing a +`__ptr_cast` intrinsic is on the table if this becomes a requirement. + + +# Drawbacks + +[drawbacks]: #drawbacks + +Changing the base assumptions about what a type represents is a major change and +programmers coming from Rust may be surprised at the differences. However the +differences are easy enough to explain and being able to shed the complexity of +borrow semantics is worth the cost. We may want to make the particulars of how +references work in Sway explicit in documentation aimed at programmers coming +from Rust. + +The proposed formulation contains some ambiguity between mutable references +and references to mutable data. This should not be a major issue in practice as +the edge cases that require expressing the difference are rare, but those rare +cases should be expressible with `*mut` and `*const` pointers once we eventually +introduce the pair of them. + +Abstracting away memory management too much might be a drawback for Sway and its +positioning as a smart-contract language, however maintaining access to pointer +arithmetic should still allow for atypical usage patterns without compromising +the use of safe and zero cost abstractions for most cases. + +# Rationale and alternatives + +[rationale-and-alternatives]: #rationale-and-alternatives + +To make the use of references more explicit and maintain syntax legibility for +people coming from Rust we could have chosen to use `&` everywhere a reference +is used and/or implicitly added it for unqualified reference types. However the +reason for Rust's use of this syntax rather than ML's terseness is to support a +borrow checker and guarantees that we do not make. The extra syntax burden does +not seem worth it since we do not provide such guarantees, nor need to provide +them. It can even be misleading. + +Since we're leaning more towards ML, we may also have chosen to use `ref` +instead of `Box` to denote reference types. However we already majorly borrow +from Rust for our library abstractions, and a single field struct seems like the +simplest way of generating a reference type in Sway as it is. + +The ambiguity as to the mutability of references or their content leads to a +choice when picking function argument syntax. i.e.: we could have used `val: +mut T` or `mut val: T` or any combination of the two. Since we decided to +collapse the mutability into one to simplify things, the current syntax is used +since it is closer to the existing Sway behavior. + +The use of `const` to denote returning const references is paricularly +conspicuous. The issue is that in Rust and in current Sway, returing an +unqualified reference type means returning a mutable type. Breaking this +assumption would not just be a big breaking change but may not even be a good +idea since mutable values are what people want to return most of the time. + +Adding a qualifier seems like the least painful way to add the ability to return +immutable references (which is an absolute requirement). We may want to later +consider forcing people to add `const` or `mut` to their return values at a +later date, and perhaps even later make the `const` implicit. + +# Prior art + +[prior-art]: #prior-art + +A lot of the design choices of this RFC have to do with strinking the balance +between Rust and ML's levels of abstraction. Sway is not meant to be a general +purpose systems programming language, but it is not a functional memory managed +language either. We want to maintain access to memory primitives whilst making +most Sway code terse, legible and easily understandable. + +As prior art we considered Rust, ML and other managed GC languages such as Java +or C# that have similar simplified memory models. + + +# Unresolved questions + +[unresolved-questions]: #unresolved-questions + +The exact memory representation of values, by themselves or as part of a slice +is not fully addressed here. The ability to represent a series of bytes is a +hard requirement but how we do this and what specific types look like in memory +will have to be decided by the implementation. + +How dynamic types interact with the ABI will have to be resolved in a later RFC. + +# Future possibilities + +[future-possibilities]: #future-possibilities + +We could introduce a marker trait (like `Copy`) to make the distinction between value types +and references explicit through the type system. + +As previously discussed, we may consider eventually making reference returns +immutable by default, however this would have to go through multiple iterations +to make the qualifiers explicit and then reverse the implicit qualifification. + +We may want to introduce a `*` operator that works over `__deref` and/or +allowing `Box` to use that operator. + +We should consider introducing slicing and indexing operator such as `[0..n]` +and `[0]`. We needn't introduce the notion of a range yet to make those work. + +As discussed, a natural extension of `*mut` is `*const`. diff --git a/rfcs/0011-expressive-diagnostics.md b/rfcs/0011-expressive-diagnostics.md new file mode 100644 index 0000000..2781728 --- /dev/null +++ b/rfcs/0011-expressive-diagnostics.md @@ -0,0 +1,387 @@ +- Feature Name: `expressive_diagnostics` +- Start Date: 2023-08-07 +- RFC PR: [FuelLabs/sway-rfcs#30](https://github.com/FuelLabs/sway-rfcs/pull/30) +- Sway Issue: [FueLabs/sway#5079](https://github.com/FuelLabs/sway/issues/5079) + +# Summary + +[summary]: #summary + +Expressive diagnostics will provide detailed, helpful, and human friendly diagnostics (warnings and errors) to Sway programmers. The diagnostic messages will be focused on the code that the programmer wrote and will lead the programmer to a possible issue resolution. At the same time, the internal implementation of diagnostics will empower Sway language developers to easily define detailed and helpful diagnostics. The implementation, the Diagnostics API, will also enforce consistent warning and error reporting. + +# Motivation + +[motivation]: #motivation + +There is an overwhelming evidence, based on experience[1],[2],[3],[4] and research[5],[6], that shows that the quality of compiler diagnostics has a significant influence on: + +- the programmer's perception of the language +- steepness of the language learning curve +- and programmer's productivity. + +Experience from several communities, like Rust[2], Elm[3], and C++[1],[5],[7], to name a few, has shown that improving diagnostics had a great impact on the three points listed above. + +At the moment, Sway diagnostics vary significantly in level of details, wording, and helpfulness. We have messages that strive to be helpful and give hints to the programmer. E.g.: + +``` +Generic type \"{name}\" is not in scope. Perhaps you meant to specify type parameters in \ + the function signature? For example: \n`fn \ + {fn_name}<{comma_separated_generic_params}>({args}) -> ... ` +``` + +The way how we structure those messages is, however, not standardized. E.g., here is an another attempt to provide a detailed error explanation, formatted and conveyed in a completely different way: + +``` +expected: {expected} \n\ + found: {given} \n\ + help: The definition of this {decl_type} must \ + match the one in the {interface_name} declaration. +``` + +At the moment, although detailed and helpful, these messages are "packed" within a single piece of text pointing to a single place in code. This significantly limits amount of helpful information that can be provided to the programmer. It also limits the presentation of the context in which the diagnostic occurs. The overall context usually spans across several points in code. We want to be able to place the useful information on the exact places in code that are relevant to the diagnostic. + +Sway programmers are also confronted with short, sometimes cryptic error messages, containing jargon used by compiler developers. E.g.: + +``` +Invalid value \"{value}\" +``` + +``` +Constant requires expression. +``` + +Internally, expressive diagnostics will empower Sway language developers to easily define detailed and helpful diagnostics that will be focused on the exact code that the programmer wrote. Diagnostics API will also provide a unified way to define diagnostics. (At the moment we have three different approaches for defining diagnostics. `CompileError`, `CompileWarning`, and `ParseError` are representatives of those different approaches.) + +Externally, we expect to see the effects experienced in other communities: + +- better perception of Sway as a language +- easier language learning curve +- increased efficiency in resolving compiler errors and warnings. + +# Guide-level explanation + +[guide-level-explanation]: #guide-level-explanation + +Diagnostics will be displayed in different clients. E.g.: + +- `forc` CLI output. +- VS Code hints. +- VS Code compiler output. +- VS Code Problems. + +Each client can choose which part of diagnostic to show and how. + +When using `forc` CLI Sway programmers will get detailed diagnostics consisting of the following elements: + +- _Level_: Error or Warning. +- _Code_: Unique code of the diagnostic. +- _Reason_: Short description of the diagnostic, not related to a specific issue that the programmer did. The _reason_ answers the question "Why is this an error or warning?" E.g., Because - "Match pattern variable is already defined". It points out the general language rule that was violated. +- _Issue_: Description of the concrete issue caused by the programmer, placed in the source code. E.g., "Variable "x" is already defined in this match arm." +- _Hints_: Detailed descriptions of the diagnostic, placed in the source code. They point to other places in code that give additional contextual information about the issue. +- _Help_: Additional friendly information, that helps better understanding and solving the issue. Same as hints, help entries can be related to a place in code, or they can be placed in the footnotes. + +![oopsies image not showing](../files/0011-expressive-diagnostics-cli.png) + +When using IDEs like VS Code Sway programmers will have the experience similar to one offered by the [Rust analyzer](https://rust-analyzer.github.io/). + +Popup in the editor could provide the _reason_, _issue_, _hints_, as well as the _help_, similar to this Rust example: + +![oopsies image not showing](../files/0011-expressive-diagnostics-vscode-hints.png) + +Programmer will have the option to open the full compiler diagnostic, getting the same output like when using `forc` CLI, similar to this Rust example: + +![oopsies image not showing](../files/0011-expressive-diagnostics-vscode-full-diagnostic.png) + +Diagnostics will also be displayed in VS Code problems: + +![oopsies image not showing](../files/0011-expressive-diagnostics-vscode-problems.png) + +## Wording guidelines + +When defining diagnostics, it is important to have in mind all the clients listed above. A diagnostic must not be optimized for one output, e.g., CLI but be hardly understandable in e.g., VS Code hints. + +To ensure consistency and apply best practices[5] the diagnostics will follow these guidelines: + +- _Reason_ will be short and point out the language constraint which was violated. It will _not_ finish in full stop, to emphasize brevity. +- _Issue_ will be short and point out the specific situation encountered in code. It is written in plain english, using proper punctuation and grammar rules. +- _Reason_ and _issue_ are given in plain english language, free of, e.g., compiler jargon. +- _Hints_ and _help_ are as well written in plain english, using proper punctuation and grammar rules. +- _Hints_ and _help_ try to give as much of useful context as possible and to be as specific as possible. +- _Help_ in footnotes should be used rarely, only for general explanations and suggestions. Preferably, _help_ should be related to a place in code. +- Identifier and type names in messages are enclosed in "double quotes". E.g., "x" or "(u32, bool)". +- Code samples in messages are enclosed in \`grave accents\`. E.g., \`let x = 0\`. +- Articles "the" and "a/an" are not used at the beginning of a sentence. E.g., "Variable "X" is already..." instead of "The variable "X" is already...". They can be used in formulations like "This is _the_ original declaration...". + +To avoid unnecessary complexity that comes through high number of diagnostic reasons (both for Sway language developers and Sway programmers), we will introduce new error codes restrictively. Reusing existing error codes and reasons will be the preferable option. To communicate specific cases we will use _hints_ and _help_. + +Here is an example. The existing error message: + +``` +Name "MyName" is defined multiple times. +``` + +could, depending on the context, have the following form: + +``` +Reason: Item names must be unique +Issue: There is already an imported struct with the name "MyName" +Hints: [error] This is the enum "MyName" that has the same name as the imported struct. + [info] The struct "MyName" gets imported here. + [info] This is the original declaration of the imported struct "MyName". +Help: Items like structs, enums, traits, and ABIs must have a unique name in scope. + Consider renaming the enum "MyName" or using an alias for the imported struct. +``` + +## Diagnostic codes + +A diagnostic _code_ uniquely identifies a particular _reason_. Considering that we already have these diagnostic areas: + +- Lexical analysis +- Parsing +- Parse tree conversion +- Type checking +- Semantic analysis +- Warnings + +the pragmatic way to ensure uniqueness and assignment of a new code number is to have a code made of three parts: + +<letter E or W for error or warning respectively><one digit area code><three digit code unique for area> + +E.g, _E4001_ would be the first error in the semantic analysis area. + +Diagnostic codes will be used to: + +- point to detailed diagnostic explanation similar to [Rust online help for error messages](https://doc.rust-lang.org/error_codes/error-index.html). +- suppress warnings by using a Sway equivalent of Rust's [#[allow]](https://docs.rs/allow/latest/allow/). + +In case of warnings, in addition to the _code_ explained above, diagnostic will have a human-readable identifier which will allow us to have human-readable, self-documenting `#[allow]`s, similar to the [Clippy lint identifiers](https://rust-lang.github.io/rust-clippy/master/index.html#/Configuration). + +When [defining warnings via proc-macros](#reference-level-explanation), the human-readable identifier will be defined, next to the _code_: + +```Rust + ... + #[warning( + reason(1, name_is_not_idiomatic, "Name is not idiomatic"), + ... +``` + +This identifier can then be used in `#[allow]`s: + +```Rust +#[allow(name_is_not_idiomatic)] +type int8_t = i8; +``` + +In addition, we will add the option to Sway compiler to treat warnings as errors. + +# Reference-level explanation + +[reference-level-explanation]: #reference-level-explanation + +We already have the `Diagnostic` struct modeled after the definition given above. This allows us to enforce the usage of expressive diagnostics via API that models the explained approach. + +`forc` uses the `ToDiagnostic` trait, implemented at the moment only by `CompileError` and `CompileWarning`, to rended expressive diagnostics for error and warning messages that support them. + +Both implementations of `ToDiagnostic` provide a fallback that renders a diagnostic indistinguishable from the existing error and warning messages. This ensures backward compatibility and gradual switch to expressive diagnostics. + +At the moment, `Diagnostic` instances can be created only imperatively, by using the Diagnostics API and manually creating the whole `Diagnostic` structure, `Reason`, `Code`, `Issue`, `Hint`s, etc. This is cumbersome and requires writing boilerplate code. It can be compared to creating error messages without [thiserror](https://docs.rs/thiserror/latest/thiserror/) as we do in `CompileWarning`. + +The proposal is to have a proc-macro for declarative definition of an expressive diagnostic. + +E.g., the const shadowing diagnostic given above would be defined as follows: + +```Rust +#[derive(Diagnostic, Debug, Clone, PartialEq, Eq, Hash)] +#[diagnostic(area = DiagnosticArea::SemanticAnalysis)] +pub enum CompileError { + ... + #[error( + reason(1, "Constants cannot be shadowed"), + issue( + name.span(), + "{variable_or_constant} \"{name}\" shadows {}constant with the same name", + if constant_decl.is_some() { "imported " } else { "" } + ), + info( + constant_span.clone(), + "Constant \"{name}\" {} here{}.", + if constant_decl.is_some() { "gets imported" } else { "is declared" }, + if *is_alias { " as alias" } else { "" } + ), + info( + constant_decl.map(|x| x.clone()), + "This is the original declaration of the imported constant \"{name}\"." + ), + error( + name.span(), + "Shadowing via {} \"{name}\" happens here.", + if variable_or_constant == "Variable" { "variable" } else { "new constant" } + ), + help("Unlike variables, constants cannot be shadowed by other constants or variables."), + help( + match (variable_or_constant.as_str(), constant_decl.is_some()) { + ("Variable", false) => format!("Consider renaming either the variable \"{name}\" or the constant \"{name}\"."), + ("Constant", false) => "Consider renaming one of the constants.".to_string(), + (variable_or_constant, true) => format!( + "Consider renaming the {} \"{name}\" or using {} for the imported constant.", + variable_or_constant.to_lowercase(), + if *is_alias { "a different alias" } else { "an alias" } + ), + _ => unreachable!("We can have only the listed combinations: variable/constant shadows a non imported/imported constant.") + } + ) + )] + ConstantsCannotBeShadowed { + variable_or_constant: String, + name: Ident, + constant_span: Span, + constant_decl: Option, + is_alias: bool, + }, + ... +} +``` + +_Issue_ and _infos_ point to places in code, and are commonly named _labels_. The places in code they refer to are given by `Span`s. +`issue` and `info` elements of `#[error]` take `Span` or `Option` as their first argument, as seen above. If the passed `Span` parameter is `None` the _label_ is considered _not to be in source code_. + +For _hints_, not being in source code means that they are not rendered at all. Diagnostics API will ignore them and clients (CLI, LSP) will not be aware of them. + +For the _issue_, clients will always display it as a part of diagnostic description. If it is in code, clients can choose to additionally display it in code. +E.g., `forc` always shows the _issue_ text as a part of the diagnostic description, but if the _issue_ is in code and there is a _hint_ pointing to the same place in code, the _issue_ will not be rendered as a _label_ in code. This overloading of the _issue_ text by a _hint_ is visible on the above example and is enforced via Diagnostics API. It gives us the flexibility to phrase the diagnostic description and the hints in an independent way, while allowing backward compatibility with the current warnings and errors that have only a single message and span. + +Using `None` to denote non-existence of an element allows declarative definition of all the _hints_, knowing that some of them might not be shown, depending on the concrete `Span` value passed to the enum variant. + +Our `Diagnostic` derive and `thiserror`'s `Error` derive would coexist as long as the "old-style" diagnostic do not get fully replaced with expressive diagnostic. The existing fallback mechanism would still be generated by the `Diagnostic` derive. + +Treating warnings as errors should be straightforward. It requires an additional compilation flag whose check will be added at final receivers of the diagnostics, before the next step in the compilation pipeline. E.g., before we generate IR we treat warnings as errors and stop the IR generation. + +# Drawbacks + +[drawbacks]: #drawbacks + +The only drawback I can think of is the time and effort needed to fully roll out the expressive diagnostic. This means to replace _all_ of the existing diagnostics with their expressive counterparts. + +Out of the experience gained while working on [supporting multi-span errors and warnings](https://github.com/FuelLabs/sway/issues/21) I can say that crafting a good expressive diagnostic takes time, as well as adjusting existing e2e tests. Coding the diagnostics itself is a negligible effort. + +But this replacement can and should be done gradually, as a side task. It should be distributed among Sway language developers and other teams that want to contribute. The same approach was taken, e.g., by [Rust](https://github.com/rust-lang/rust/issues/35233) and [Elm](https://elm-lang.org/news/compiler-errors-for-humans) communities. + +# Rationale and alternatives + +[rationale-and-alternatives]: #rationale-and-alternatives + +Not implementing expressive diagnostics would mean staying behind the modern approach to compiler diagnostics. Expressive diagnostics are becoming de facto standard in compiler development (see the [Prior art](#prior-art) section below). To position Sway as a modern and developer friendly language, we have to follow this standard. I don't see how we can choose not to implement expressive diagnostics. + +To the proposed design, the approach and its alternatives were shortly discussed when implementing [support for multi-span errors and warnings](https://github.com/FuelLabs/sway/issues/21). + +## Alternative 1: Using generic diagnostic structure + +This approach would mean having a possibility do define an arbitrary diagnostic, similar to an arbitrary output that can be achieved by using `Snippet`, `Slice`, `Annotation`, and other abstractions provided by [annotate-snippets](https://github.com/rust-lang/annotate-snippets-rs). We have concluded that this additional freedom would not give us any more expressive power, but just create additional confusion when defining diagnostics. It would also very likely lead toward inconsistencies similar to those we have now (different wording, different ways to explain an issue, etc.). + +## Alternative 2: Evaluating and using `miette` crate + +[miette](https://github.com/zkat/miette) crate provides a holistic library for defining, reporting, and rendering diagnostics. We have concluded that using `miette` would mean changing our existing infrastructure that already works well plus losing the flexibility if we decide to do thing differently then envisioned by `miette`. The conclusion was to approach `miette` as an inspiration for our own API. + +# Prior art + +[prior-art]: #prior-art + +## Expressive diagnostics +Expressive diagnostics are gaining relevance, or are already significant part of compilers like: + +- Rust[2] +- GCC[5], [7] +- Clang[1] +- Elm[3] + +Microsoft puts an effort in improving diagnostics in their VC++ compiler[5]. + +Java error messages, traditionally knowing to be terse and difficult for novices and students, are enhanced with tools like Decaf, or Expresso. + +## Using macros for diagnostic definition + +`thiserror` and its macros are de facto standard for defining error messages in Rust. + +`miette` follows the same approach for defining rich diagnostics, with its `diagnostic` macro: + +```Rust +#[diagnostic( + code(oops::my::bad), + url(docsrs), + help("try doing it better next time?") +)] +``` + +## Using diagnostic codes + +Unique diagnostic codes can be find in many compilers including [Rust](https://doc.rust-lang.org/error_codes/error-index.html), [Clang](https://clang.llvm.org/docs/InternalsManual.html#the-diagnostics-subsystem), and [C#](https://github.com/thomaslevesque/GenerateCSharpErrors/blob/master/CSharpErrorsAndWarnings.md). They can be used to: + +- point to detailed diagnostic explanation +- suppress warnings + +Rust takes advantage of both possibilities by offering [online help for error messages](https://doc.rust-lang.org/error_codes/error-index.html) and a possibility to suppress warnings by using [#[allow]](https://docs.rs/allow/latest/allow/). + +# Unresolved questions + +[unresolved-questions]: #unresolved-questions + +The RFC has no unresolved questions. During the discussion, the following questions were resolved: + +- the structure of the Diagnostic API. +- the usage of symbolic warning identifiers like e.g., `name_is_not_idiomatic` in addition to warning codes like e.g., `W0001`. + +# Future possibilities + +[future-possibilities]: #future-possibilities + +Once we get online documentation for diagnostics (as proposed in [Sway 3512](https://github.com/FuelLabs/sway/issues/3512)) we can extend diagnostic messages with links to detailed explanation by adding an element like: + +``` +info: For more information see: https://.../E4001 +``` + +Similar to Rust, we can also add the _explain_ command to `forc`. E.g.: + +``` +forc explain E4001 +``` + +Once _The Sway Book_ and _The Sway Reference_ become stable we can also enhance `Diagnostic` by adding references to the documentation. E.g.: + +```Rust +#[error( + reason(1, "Constants cannot be shadowed"), + ... + book("Shadowing", "relative/path/to/chapter/shadowing"), + ref("Constants", "relative/path/to/chapter/constants"), + ... +)] +``` + +This would render additional _info_ lines. E.g.: + +``` +info: For more information, see: + - the chapter "Shadowing" in The Sway Book: https://.../shadowing + - the chapter "Constants" in The Sway Reference: https://.../constants +``` + +# References + +[references]: #references + +- \[1\] [Expressive Diagnostics](https://clang.llvm.org/diagnostics.html) +- \[2\] [Shape of errors to come](https://blog.rust-lang.org/2016/08/10/Shape-of-errors-to-come.html) +- \[3\] [Compiler Errors for Humans](https://elm-lang.org/news/compiler-errors-for-humans) +- \[4\] [Compilers as Assistants](https://elm-lang.org/news/compilers-as-assistants) +- \[5\] [Concepts Error Messages for Humans](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2429r0.pdf) +- \[6\] [Compiler Error Messages Considered Unhelpful: The Landscape of Text-Based Programming Error Message Research](https://web.eecs.umich.edu/~akamil/papers/iticse19.pdf) +- \[7\] [Usability improvements in GCC 8](https://developers.redhat.com/blog/2018/03/15/gcc-8-usability-improvements#) + +[1]: https://clang.llvm.org/diagnostics.html "Expressive Diagnostics" +[2]: https://blog.rust-lang.org/2016/08/10/Shape-of-errors-to-come.html "Shape of errors to come" +[3]: https://elm-lang.org/news/compiler-errors-for-humans "Compiler Errors for Humans" +[4]: https://elm-lang.org/news/compilers-as-assistants "Compilers as Assistants" +[5]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2429r0.pdf "Concepts Error Messages for Humans" +[6]: https://web.eecs.umich.edu/~akamil/papers/iticse19.pdf "Compiler Error Messages Considered Unhelpful: The Landscape of Text-Based Programming Error Message Research" +[7]: https://developers.redhat.com/blog/2018/03/15/gcc-8-usability-improvements "Usability improvements in GCC 8" \ No newline at end of file