Skip to content

Latest commit

 

History

History
190 lines (132 loc) · 10.8 KB

UIP-0107.md

File metadata and controls

190 lines (132 loc) · 10.8 KB
uip title description author status type category created
71
Auras Renovation
Address inconsistencies in the aura type system
~ponmep-litsem
Draft
Standards Track
Hoon
2023-06-30

Abstract

The aura type system provides a way to indicate a particular interperation of an atom in Hoon. An aura of an atom influences both how it is presented to the user, as well as specifies the parsing routine. Currently, auras are used in an inconsistent manner, thus preventing enforcement of atom sanity at the compiler level. In this proposal, we address these inconsistencies, together with related problems. With the implementation of this proposal, auras become a correct type system tool, dispensing with the need for manual atom sanity checks.

Motivation

There are several cases in which auras are used in an inconsistent manner.

  1. The aura type itself, defined as @ta, violates the constraint on knots, which do not permit uppercase letters.

  2. Use of a term, which is too restrictive, for an aura variable, which admits uppercase bit-width annotations. An example of this inconsistency is in the Hoon atom definition,

    +$  hoon
    ...
    [%rock p=term q=*]
    [%sand p=term q=*]
    

as well as in many functions throughout sys/hoon.hoon which accept term for an aura. This is seen in practice with tapes, defined as (list @tD). Another example is +fitz which accepts a term for a general aura.

  1. Further, there are two Hoon literals which have uppercase letters in their digit set: @uc and @uw. The scot atom printer returns a @ta, and the slaw family of atom parsers similarly violates this same constraint, accepting a @ta as their string argument.

  2. The +slaw family of atom parsers accepts a @tas for an aura, thus restricting valid auras with bit-width extensions.

  3. Clay persists invalid paths according to the path specification as a list of @ta. These can either be some valid @uc, @uw atoms, or, more generally, any path addressing a file whose name contains uppercase letters.

These violations of the aura type system make it impossible to enforce sanity of atoms at the compiler level, as originally envisioned with the +sane and +ruth arms. Fixing the inconsistencies will empower the compiler to check atom sanity at compile time, freeing the programmer from the need to perform manual atom sanity checks with +sane.

Apart from the inconsistent use of the auras, there are also several related problems.

  • (A) +sane, so far not used systematically, is under-specified and performs only some atom sanity checks
  • (B) Default hexadecimal printing of user-defined, unnested auras could become a source of unforseen bugs

Specification

  • The knot aura @ta denotes a string composed of allowed characters: [0-9], [a-z], [A-Z], cab, hep, sig, dot. It is considered URL-safe, and allows the encoding of all Hoon constants.
  • Aura syntax. A valid aura consists of three parts, prefixed by a pat: the major part, the bit-width (optional), and the minor part (optional). The major part is matched by [a-z]*. The bit-width is matched by \.?[A-Z]. The minor part is matched by -[a-z]+.
  • Aura nesting rules. Aura B nests within aura A if the following conditions are satisfied. (1) Nesting in the major part. B nests in A if the major part of B is an extension of the major part of A. (2) Nesting in the bit-width. If the dot prefix is absent, B nests in A if the bit-width of B is not greater than the bit-width of A. If the dot prefix is present, B nests in A if the bit-width of B is equal to the bit-width of A. Absence of bit-width indicates infinite size. (3) Nesting in the minor part. B nests in A if the minor part of B matches the minor part of A.
  • (To be determined) The @t aura denotes a UTF-8 string composed of such and such characters
  • (To be determined) The @c aura denotes a UTF-32 string composed of such and such characters
  • An atom violating its aura constraint, as defined by +sane, is considered mad
  • Production of mad atoms is guarded against by the Hoon compiler through +sane and +ruth
  • Auras are printed by the +scot arm. The pretty-printer renders user-defined auras according their nearest defined nest.

Rationale

@ta to allow uppercase letters

The exclusion of uppercase letters from the knot aura @ta specification seems unnecessary. Relaxation of this condition alone solves problems 1--5. There seems to be little justification for it at present, beyond enforcing strict URL safety. It is clear that such strictness can no longer be maintained.

Hoon literals can not be considered URL-safe unless uppercase letters are permitted. Overwhelming majority of websites is served by UNIX systems, which are naturally case sensitive. Clay too has to allow uppercase filenames.

We therefore propose to relax this restriction and allow uppercase letters in @ta, declaring it to be URL-safe.

New aura syntax and nesting rules

Major part nests only by extension.

The nesting rules for the major part are modified. Previously, the nesting rules functioned as a bidirectional compatibility check. The aura @tas nests in @ta, and @ta nests in @tas. The latter could easily result in production of mad atoms, since an aura extension could introduce a constraint not present in its parent.

Bit-width syntax with alternative nesting behaviour

Currently, only the last uppercase letter is treated as a bit-width extension by the aura compatibility check arm +fitz. This means that an aura like @uDE denotes an aura of type @uD with the bit-width extension of E -- 16 bits. We propose to simplify things by further restricting the bit-width syntax: only a single, trailing uppercase letter is allowed in the aura specification. Thus an aura @uDE is invalid, while @uD and @uE are allowed.

With the introduction of the optional dot prefix on the bit-width, we anticipate a need for numeric auras of fixed precision, whose arithmetics do not allow for nesting according to usual rules. There are currently four floating-point types of varying bit-width: @rh (16 bit), @rs (32 bit), @rd (64 bit), @rq (128 bit). These types can not be defined as, respectively, @rE, @rF, @rG, @rH, since this would imply nesting according to their size. On the other hand, introduction of unsigned integer numeric auras following the established convention is prohibited, as there already exists an @ud aura. We therefore propose to introduce an alternative nesting behaviour of the bit-widths based on exact match, which is enabled by the prefix dot.

The floating-point auras can now be specified as @r.E, @r.F, @r.G, @r.H. An example mnemonics would be E - easy, 16 bits, F - fine, 32 bits, G - good, 64 bits, and H - huge, 128 bits.

Minor part nests by exact match

The minor part of an aura allows us to introduce more descriptive aura types, to state the indented purpose of the aura, in addition to its syntax and semantics. For instance, @uxD-flag could be used when communicating with hardware devices, while @uv-blob could be used in place of the less readable @uvblob. The minor part nests based on exact match, as opposed to the major part, which nests by extension.

Example parser of the new syntax

An example implementation of the new +mota to be introduced by this proposal is presented below.

|=  a=tape
=<
(scan a mota)
|%
++  mota  ;~(pfix pat ;~(plug major (punt bit-width) (punt minor)))
++  major  (star low)
++  bit-width  ;~(plug ;~(pose (cold | dot) (easy &)) hig)
++  minor  ;~(plug hep (plus low))
--

Atom conversion is performed with the ;; rune

The old nesting behaviour lets us generate mad atoms at will,

> `@tas`'happy knot'
%happy knot

. With the preceding change, it is no longer allowed to cast to a previously compatible aura in general.

> ^-(@tas 'happy knot')
mint-nice
-need.@tas
-have.@ta
nest-fail

> `@if`0xcafe
mint-nice
-need.@if
-have.@
nest-fail

To cast to an aura we use the ;; rune

> ;;(@tas 'happy knot')
dojo: hoon expression failed

> ;;(@tas 'happy-term')
%happy-term

This extends to size-restricted atoms as well.

> ^-(@uE 0xbade)
mint-nice
-need.@uD
-have.@ux

In the last example, even though 0xbade is within the 16-bit size of @uE, and @ux nests within @u, the infinite bit-width of @ux prevents nesting. Instead, one has to cast using ;;

> ;;(@uE 0xbade)
47.838

The new nesting behavior introduces a clear semantic distinction between a cast to a nesting aura (the ^- rune), and a conversion to a given aura (the ';;' rune), which might fail. In this way, the nock-level semantics of the ^- rune remains unchanged.

Atom sanity to be enforced at compile-time

After restrictions placed upon each aura are examined and encoded in +sane, +ruth arm can be enabled. Henceforth, violations of aura constraints are going to be caught at compile-time. There are currently four families of constrained auras: @c, @i, @f, @t. In addition, any aura can be further size-restricted by a bit-width annotation. The size restriction on atoms is now enforced.

Sanity of @c atoms

@c needs to be a valid UTF-32 string. Precise constraints to be determined.

Sanity of @i atoms

The internet address atoms @i possess an implicit size restrictions. @if should not exceed 32 bits, and @is should not exceed 128 bits.

Sanity of @r atoms

The floating-point auras @rh, @rs, @rd, @rq possess implicit size restriction.

Sanity of @t atoms

@t family of auras specify progressively restricted string types. Precise constraints to be determined.

+scot needs to sanitize its input, as it effectively accepts an atom with an aura as its input. Currently,%dojo needs to perform a sanity check before printing text to the terminal.

Change in default printing of atoms

The default printing of user-defined, unnested auras as hexadecimal number without separators tempts the programmer to exploit this behaviour as a convenient way to print atoms. It is also inconsistent with the rendering of the empty aura @ as an unsigned decimal atom. Rather, user-defined, unnested auras should render in the same way as the @ aura, under which they nest.

Backward Compatibility

  • The stricter aura syntax, with only single trailing uppercase bit-width annotation allowed, is currently obeyed across all standard desks
  • The redefined @ta aura will cover a wider range of strings and is therefore backward-compatible
  • With the enabling of +ruth, code that could potentially violate atom sanity by casting to an extended aura will crash
  • With the change of default printing of unnested user-defined auras, any code relying on it will malfunction.

Acknowledgements

~rovnys-ricfer proposed extending the aura syntax with minor part.

Copyright

Copyright and related rights waived via CC0.