uip | title | description | author | status | type | category | created |
---|---|---|---|---|---|---|---|
71 |
Auras Renovation |
Address inconsistencies in the aura type system |
~ponmep-litsem |
Draft |
Standards Track |
Hoon |
2023-06-30 |
The aura type system provides a way to indicate a particular interperation of an atom in Hoon. An aura of an atom influences both how it is presented to the user, as well as specifies the parsing routine. Currently, auras are used in an inconsistent manner, thus preventing enforcement of atom sanity at the compiler level. In this proposal, we address these inconsistencies, together with related problems. With the implementation of this proposal, auras become a correct type system tool, dispensing with the need for manual atom sanity checks.
There are several cases in which auras are used in an inconsistent manner.
-
The
aura
type itself, defined as@ta
, violates the constraint on knots, which do not permit uppercase letters. -
Use of a
term
, which is too restrictive, for an aura variable, which admits uppercase bit-width annotations. An example of this inconsistency is in the Hoon atom definition,+$ hoon ... [%rock p=term q=*] [%sand p=term q=*]
as well as in many functions throughout sys/hoon.hoon
which accept term
for an aura
. This is seen in practice with tapes, defined as (list @tD)
. Another example is +fitz
which accepts a term
for a general aura.
-
Further, there are two Hoon literals which have uppercase letters in their digit set:
@uc
and@uw
. Thescot
atom printer returns a@ta
, and theslaw
family of atom parsers similarly violates this same constraint, accepting a@ta
as their string argument. -
The
+slaw
family of atom parsers accepts a@tas
for an aura, thus restricting valid auras with bit-width extensions. -
Clay persists invalid paths according to the path specification as a list of
@ta
. These can either be some valid@uc
,@uw
atoms, or, more generally, any path addressing a file whose name contains uppercase letters.
These violations of the aura type system make it impossible to enforce sanity of atoms at the compiler level, as originally envisioned with the +sane
and +ruth
arms. Fixing the inconsistencies will empower the compiler to check atom sanity at compile time, freeing the programmer from the need to perform manual atom sanity checks with +sane
.
Apart from the inconsistent use of the auras, there are also several related problems.
- (A)
+sane
, so far not used systematically, is under-specified and performs only some atom sanity checks - (B) Default hexadecimal printing of user-defined, unnested auras could become a source of unforseen bugs
- The knot aura
@ta
denotes a string composed of allowed characters:[0-9], [a-z], [A-Z], cab, hep, sig, dot
. It is considered URL-safe, and allows the encoding of all Hoon constants. - Aura syntax. A valid aura consists of three parts, prefixed by a
pat
: the major part, the bit-width (optional), and the minor part (optional). The major part is matched by[a-z]*
. The bit-width is matched by\.?[A-Z]
. The minor part is matched by-[a-z]+
. - Aura nesting rules. Aura B nests within aura A if the following conditions are satisfied. (1) Nesting in the major part. B nests in A if the major part of B is an extension of the major part of A. (2) Nesting in the bit-width. If the dot prefix is absent, B nests in A if the bit-width of B is not greater than the bit-width of A. If the dot prefix is present, B nests in A if the bit-width of B is equal to the bit-width of A. Absence of bit-width indicates infinite size. (3) Nesting in the minor part. B nests in A if the minor part of B matches the minor part of A.
- (To be determined) The
@t
aura denotes a UTF-8 string composed of such and such characters - (To be determined) The
@c
aura denotes a UTF-32 string composed of such and such characters - An atom violating its aura constraint, as defined by
+sane
, is considered mad - Production of mad atoms is guarded against by the Hoon compiler through
+sane
and+ruth
- Auras are printed by the
+scot
arm. The pretty-printer renders user-defined auras according their nearest defined nest.
The exclusion of uppercase letters from the knot aura @ta
specification seems unnecessary. Relaxation of this condition alone solves problems 1--5. There seems to be little justification for it at present, beyond enforcing strict URL safety. It is clear that such strictness can no longer be maintained.
Hoon literals can not be considered URL-safe unless uppercase letters are permitted. Overwhelming majority of websites is served by UNIX systems, which are naturally case sensitive. Clay too has to allow uppercase filenames.
We therefore propose to relax this restriction and allow uppercase letters in @ta
, declaring it to be URL-safe.
The nesting rules for the major part are modified. Previously, the nesting rules functioned as a bidirectional compatibility check. The aura @tas
nests in @ta
, and @ta
nests in @tas
. The latter could easily result in production of mad atoms, since an aura extension could introduce a constraint not present in its parent.
Currently, only the last uppercase letter is treated as a bit-width extension by the aura compatibility check arm +fitz
. This means that an aura like @uDE
denotes an aura of type @uD
with the bit-width extension of E
-- 16 bits. We propose to simplify things by further restricting the bit-width syntax: only a single, trailing uppercase letter is allowed in the aura specification. Thus an aura @uDE
is invalid, while @uD
and @uE
are allowed.
With the introduction of the optional dot prefix on the bit-width, we anticipate a need for
numeric auras of fixed precision, whose arithmetics do not allow for nesting according to usual rules.
There are currently four floating-point types of varying bit-width: @rh
(16 bit), @rs
(32 bit), @rd
(64 bit), @rq
(128 bit). These types can not be defined as, respectively, @rE
, @rF
, @rG
, @rH
, since this would imply nesting according to their size. On the other hand, introduction of unsigned integer numeric auras following the established convention is prohibited, as there already exists an @ud
aura. We therefore propose to introduce an alternative nesting behaviour of the bit-widths based on exact match, which is enabled by the prefix dot.
The floating-point auras can now be specified as @r.E
, @r.F
, @r.G
, @r.H
. An example mnemonics would be E - easy, 16 bits
, F - fine, 32 bits
, G - good, 64 bits
, and H - huge, 128 bits
.
The minor part of an aura allows us to introduce more descriptive aura types, to state the indented purpose of the aura, in addition to its syntax and semantics. For instance, @uxD-flag
could be used when communicating with hardware devices, while @uv-blob
could be used in place of the less readable @uvblob
. The minor part nests based on exact match, as opposed to the major part, which nests by extension.
An example implementation of the new +mota
to be introduced by this proposal is presented below.
|= a=tape
=<
(scan a mota)
|%
++ mota ;~(pfix pat ;~(plug major (punt bit-width) (punt minor)))
++ major (star low)
++ bit-width ;~(plug ;~(pose (cold | dot) (easy &)) hig)
++ minor ;~(plug hep (plus low))
--
The old nesting behaviour lets us generate mad atoms at will,
> `@tas`'happy knot'
%happy knot
. With the preceding change, it is no longer allowed to cast to a previously compatible aura in general.
> ^-(@tas 'happy knot')
mint-nice
-need.@tas
-have.@ta
nest-fail
> `@if`0xcafe
mint-nice
-need.@if
-have.@
nest-fail
To cast to an aura we use the ;;
rune
> ;;(@tas 'happy knot')
dojo: hoon expression failed
> ;;(@tas 'happy-term')
%happy-term
This extends to size-restricted atoms as well.
> ^-(@uE 0xbade)
mint-nice
-need.@uD
-have.@ux
In the last example, even though 0xbade
is within the 16-bit size of @uE
, and @ux
nests within @u
, the infinite bit-width of @ux
prevents nesting. Instead, one has to cast using ;;
> ;;(@uE 0xbade)
47.838
The new nesting behavior introduces a clear semantic distinction between a cast to a nesting aura (the ^-
rune), and a conversion to a given aura (the ';;' rune), which might fail. In this way, the nock-level semantics of the ^-
rune remains unchanged.
After restrictions placed upon each aura are examined and encoded in +sane
, +ruth
arm can be enabled. Henceforth, violations of aura constraints are going to be caught at compile-time. There are currently four families of constrained auras: @c, @i, @f, @t
. In addition, any aura can be further size-restricted by a bit-width annotation. The size restriction on atoms is now enforced.
@c
needs to be a valid UTF-32 string. Precise constraints to be determined.
The internet address atoms @i
possess an implicit size restrictions. @if
should not exceed 32 bits, and @is
should not exceed 128 bits.
The floating-point auras @rh
, @rs
, @rd
, @rq
possess implicit size restriction.
@t
family of auras specify progressively restricted string types. Precise constraints to be determined.
+scot
needs to sanitize its input, as it effectively accepts an atom with an aura as its input.
Currently,%dojo
needs to perform a sanity check before printing text to the terminal.
The default printing of user-defined, unnested auras as hexadecimal number without separators tempts the programmer to exploit this behaviour as a convenient way to print atoms. It is also inconsistent with the rendering of the empty aura @
as an unsigned decimal atom. Rather, user-defined, unnested auras should render in the same way as the @
aura, under which they nest.
- The stricter aura syntax, with only single trailing uppercase bit-width annotation allowed, is currently obeyed across all standard desks
- The redefined
@ta
aura will cover a wider range of strings and is therefore backward-compatible - With the enabling of
+ruth
, code that could potentially violate atom sanity by casting to an extended aura will crash - With the change of default printing of unnested user-defined auras, any code relying on it will malfunction.
~rovnys-ricfer proposed extending the aura syntax with minor part.
Copyright and related rights waived via CC0.