Specify concrete syntax for built-in lists, pairs, and data (PLT-378) (…

…#5692) * Specification of concrete syntax for list, pair, and data * Tidying up * Update syntax in examples * More accurate page numbers in symbol index * Fix formatting * Add note about discrepancy between syntax of data in specification and implementation
IntersectMBO · Jan 3, 2024 · c776fe7 · c776fe7 · github-actions · Jan 3, 2024
1 parent f734591
commit c776fe7
Show file tree

Hide file tree

Showing 9 changed files with 202 additions and 103 deletions.
diff --git a/doc/plutus-core-spec/Makefile b/doc/plutus-core-spec/Makefile
@@ -26,9 +26,10 @@ ${DOC}.pdf: ${SRC} ${BIB}
 	${BIBTEX} ${DOC}
 	${LATEX}  ${DOC}   # to make sure the (cross)references are correct
 	${MAKEINDEX} ${DOC}.nlo -s nomencl.ist -o ${DOC}.nls
-        # ^ Putting this later seems to get page references out of sync.
 	${LATEX}  ${DOC}
+	${MAKEINDEX} ${DOC}.nlo -s nomencl.ist -o ${DOC}.nls
 	${LATEX}  ${DOC}
+	${MAKEINDEX} ${DOC}.nlo -s nomencl.ist -o ${DOC}.nls
 	${LATEX}  ${DOC}
 
 figs:

diff --git a/doc/plutus-core-spec/cardano/builtins1.tex b/doc/plutus-core-spec/cardano/builtins1.tex
@@ -57,31 +57,25 @@ \subsubsection{Built-in types and type operators}
 Unicode replacement character \texttt{U+FFFD}.
 
 
-\paragraph{Concrete syntax for higher-order types.} Types such as $\listOf{\ty{integer}}$
-and $\pairOf{\ty{bool}}{\ty{string)}}$ are represented by application at the
-type level, thus: \texttt{[(con list) (con integer)]} and\texttt{[(con pair)
-    (con bool) (con string)]}.  Each higher-order type will need further syntax
-for representing constants of those types.  For example, we might use
-\texttt{[]} for list values and \texttt{(,)} for pairs, so the list $[11,22,33]$
-might be written as
-\begin{verbatim}
-   (con [(con list) (con integer)] 
-        [(con integer 11), (con integer 22), (con integer 33)]
-   )
-\end{verbatim}
-and the pair (True, "Plutus") as
+\paragraph{Concrete syntax for lists and pairs.}
+A list of type $\texttt{list}(t)$ is written as a syntactic list
+\texttt{[$c_1, \ldots, c_n$]} where each $c_i$ lies in $\bitc_t$; a pair of type
+$\texttt{pair}(t_1,t_2)$ is written as a syntactic pair $\texttt{(}c_1,c_2\texttt{)}$
+with $c_1 \in \bitc_{t_1}$ and $c_2 \in \bitc_{t_2}$.  Some valid constant expressions
+are thus
+
 \begin{verbatim}
-   (con [(con pair) (con bool) (con string)] 
-        ((con bool True), (con string "Plutus"))
-   ).
+   (con (list integer) [11, 22, 33])
+   (con (pair bool string) (True, "Plutus")).
+   (con (list (pair bool (list bytestring)))
+      [(True, []), (False, [#,#1F]), (True, [#123456, #AB, #ef2804])])
 \end{verbatim}
-Note however that this syntax is not currently supported by most Plutus Core tools at the time of writing.
-
 
 
 \paragraph{The $\ty{data}$ type.}
-We provide a built-in type $\ty{data}$ which permits the encoding of simple data structures
-for use as arguments to Plutus Core scripts.  This type is defined in Haskell as 
+We provide a built-in type $\ty{data}$ which permits the encoding of simple data
+structures for use as arguments to Plutus Core scripts.  This type is defined in
+Haskell as
 \begin{alltt}
    data Data =
       Constr Integer [Data]
@@ -144,16 +138,53 @@ \subsubsection{Built-in types and type operators}
 the former is part of a built-in type whereas the latter is part of the language
 itself.
 
+\newcommand{\syn}[1]{c_{\mathtt{{#1}}}}
+
+\paragraph{Concrete syntax for $\ty{data}$.}
+The concrete syntax for $\ty{data}$ is given by
+
+\begin{minipage}{0.6\linewidth}
+    \centering
+    \[\begin{array}{rcl}
+    \syn{data} & ::= & \texttt{(Constr} \ \syn{integer} \ \syn{list(data)} \texttt{)}\\
+               &  & \texttt{(Map} \ \syn{list(pair(data,data))} \texttt{)}\\
+               &  & \texttt{(List} \ \syn{list(data)} \texttt{)}\\
+               &  & \texttt{(I} \ \syn{integer} \texttt{)}\\
+               &  & \texttt{(B} \ \syn{bytestring} \texttt{)}.
+    \end{array}\]
+    \label{fig:data-concrete-syntax}
+\end{minipage}
+
+\noindent We interpret these syntactic constants as elements of $\denote{\ty{data}}$ using
+the various `$\inj$' functions defined earlier.  Some valid \texttt{data}
+constants are
+
+\begin{verbatim}
+   (con data (Constr 1 [(I 2), (B #), (Map [])]) 
+   (con data (Map [((I 0), (B #00)), ((I 1), (B #0F))])) 
+   (con data (List [(I 0), (I 1), (B #7FFF), (List []]))) 
+   (con data (I -22)) 
+   (con data (B #001A)).
+\end{verbatim}
+% TODO: be more relaxed about parenthesisation in general
+
+\paragraph{Note.}  At the time of writing the syntax accepted by IOG's parser for textual Plutus Core
+differs slightly from that above in that subobjects
+of \texttt{Constr}, \texttt{Map} and \texttt{List} objects must \textit{not} be
+parenthesised: for example one must write \verb|(con data (Constr 1 [I 2, B #,Map []])|.
+This discrepancy will be resolved in the near future.
+
+
 \subsubsection{Built-in functions}
 \label{sec:built-in-functions-1}
 The first batch of built-in functions is shown in
-Table~\ref{table:built-in-functions-1}.  The table indicates which
-functions can fail during execution, and conditions causing failure are
-specified either in the denotation given in the table or in a relevant note.
-Recall also that a built-in function will fail if it is given an argument of the
-wrong type: this is checked in conditions involving the $\sim$ relation and the
-$\Eval$ function in Figures~\ref{fig:untyped-term-reduction}
-and~\ref{fig:untyped-cek-machine}.  Note also that some of the functions are
+Table~\ref{table:built-in-functions-1}.  The table indicates which functions can
+fail during execution, and conditions causing failure are specified either in
+the denotation given in the table or in a relevant note.  Recall also that a
+built-in function will fail if it is given an argument of the wrong type: this
+is checked in conditions involving the $\sim$ relation and the $\Eval$ function
+in Figures~\ref{fig:untyped-term-reduction} and~\ref{fig:untyped-cek-machine}.
+Note also that some of the functions are
 \#-polymorphic.  According to Section~\ref{sec:builtin-denotations} we
 require a denotation for every possible monomorphisation of these; however all
 of these functions are parametrically polymorphic so to simplify notation we

diff --git a/doc/plutus-core-spec/cardano/cardano.tex b/doc/plutus-core-spec/cardano/cardano.tex
@@ -5,17 +5,22 @@ \section{Built-in types and functions}
 \paragraph{Built-in batches.}
 \label{sec:builtin-batches}
 
-The built-in types and functions are defined in batches corresponding to how they were added to ledger languages.
-These batches are given in the following sections.
+The built-in types and functions are defined in batches corresponding to how
+they were added to ledger languages.  These batches are given in the following
+sections.
 
 \paragraph{Built-in semantics variants.}
 \label{sec:builtin-semantics-variants}
 
-In rare cases we can make a mistake or need to change the actual behaviour of a built-in function.
-To handle this we define a series of built-in semantics variants, which indicate which behaviour should be used.
-A fix will typically be deployed by defining a new semantics variant, and then using that variant for future ledger languages (but not existing ones, since this is usually a backwards-incompatible change).
+In rare cases we can make a mistake or need to change the actual behaviour of a
+built-in function.  To handle this we define a series of built-in semantics
+variants, which indicate which behaviour should be used.  A fix will typically
+be deployed by defining a new semantics variant, and then using that variant for
+future ledger languages (but not existing ones, since this is usually a
+backwards-incompatible change).
 
-Changes are listed alongside the original definition of the built-in function in its original batch, and are indexed in the following table.
+Changes are listed alongside the original definition of the built-in function in
+its original batch, and are indexed in the following table.
 
 \begin{table}[H]
   \centering
@@ -31,7 +36,48 @@ \section{Built-in types and functions}
     \label{table:bs-variants}
 \end{table}
 
+
+\paragraph{Concrete syntax for built-in types.}
+Recall that in the abstract notation for built-in types introduced in
+Section~\ref{sec:built-in-types}, a built-in type is either an \textit{atomic
+  type} such as \texttt{integer} or \texttt{string} or an application $\op(T_1,
+\ldots, T_n)$ of a \textit{type operator} to a sequence of built-in types.  The
+concrete syntax of built-in types used in textual Plutus Core programs is
+slightly different in that we use a curried form of application for type
+operators: a type is given by
+
+\newcommand{\bitn}{\mathbf{T}} % built-in type name
+\newcommand{\bitc}{\mathbf{C}} % built-in constant syntax
+
+\begin{minipage}{\linewidth}
+    \centering
+    \[\begin{array}{rclr}
+    \bitn & ::= & \textit{atomic-type} & \textrm{Atomic type}\\
+                & & \texttt{(}\op \ \bitn_1 \ldots \bitn_{\valency{\op}}) & \textrm{Type application}\\
+    \end{array}\]
+    \label{fig:built-in-type-concrete-syntax}
+\end{minipage}
+\noindent Note that we again require that all type operators are fully applied.
+We refer to the syntactic objects $\bitn$ above as \textit{concrete built-in
+ types}. There is an obvious bijection between these and the abstract built-in
+types used elsewhere in this document, and given an abstract built-in type $T$
+we will denote the corresponding concrete built-in type by $\bar{T}$.
+
+\paragraph{Concrete syntax for built-in constants.}
+We provide concrete syntax for constants of most (but not all) built-in types.
+For a built-in type $T$ which has a concrete syntax we specify a set $\bitc_T$
+of strings (using either regular expressions or a BNF-style grammar), and a
+constant of type $T$ is then represented in the concrete syntax by an expression
+of the form \texttt{(con $\bar{T}$ $c_T$)} with $c_T \in \bitc_T$.  Each string
+$c_T$ will have an interpretation as a value of type $T$ (ie, an element of
+$\denote{T}$) and since this will generally be the obvious interpretation we
+will not always spell out the details.%%
+\nomenclature[H]{$\bitc_T$}{Set of strings used for the concrete syntax of constants of built-in type $T$}
+
+
 \input{cardano/builtins1.tex}
 \input{cardano/builtins2.tex}
 \input{cardano/builtins3.tex}
 \input{cardano/builtins4.tex}
+
+
diff --git a/doc/plutus-core-spec/data-cbor.tex b/doc/plutus-core-spec/data-cbor.tex
@@ -61,12 +61,12 @@ \section{Notation}
 $$
 \e_X : X \rightarrow \B^*
 $$%
-\nomenclature[HC]{$\e_X$}{CBOR encoder for \texttt{data}}
+\nomenclature[IC]{$\e_X$}{CBOR encoder for \texttt{data}}
 and decoding functions (or \textit{decoders})
 $$
 \d_X : \B^* \rightharpoonup \B^* \times X
 $$%
-\nomenclature[HC]{$\d_X$}{CBOR decoder for \texttt{data}}
+\nomenclature[IC]{$\d_X$}{CBOR decoder for \texttt{data}}
 
 \noindent for various sets $X$, such as the set $\Z$ of integers and the set of
 all \texttt{data} items.  The encoding function $\e_X$ takes an element $x \in

diff --git a/doc/plutus-core-spec/flat-serialisation.tex b/doc/plutus-core-spec/flat-serialisation.tex
@@ -42,12 +42,12 @@ \section{Encoding and decoding}
 $$
 \E_X : \Bits \times X \rightarrow \Bits
 $$%
-\nomenclature[HF]{$\E_X$}{Flat encoder}
+\nomenclature[IF]{$\E_X$}{Flat encoder}
 and (partial) decoding functions (or \textit{decoders})
 $$
 \D_X : \Bits \rightharpoonup \Bits \times X
 $$%
-\nomenclature[HF]{$\D_X$}{Flat decoder}
+\nomenclature[IF]{$\D_X$}{Flat decoder}
 
 \noindent for various sets $X$, such as the set $\Z$ of integers and the set of
 all Plutus Core terms.  The encoding function $\E_X$ takes a sequence $s \in

diff --git a/doc/plutus-core-spec/header.tex b/doc/plutus-core-spec/header.tex
@@ -1,12 +1,3 @@
-\renewcommand{\thefootnote}{\fnsymbol{footnote}}
-
-% correct bad hyphenation here
-\hyphenation{byte-string}
-
-\usepackage[hyphens]{url}
-\usepackage[colorlinks=true,linkcolor=MidnightBlue,citecolor=ForestGreen,urlcolor=Plum,pageanchor]{hyperref}
-% ^ pageanchor is for nomencl
-
 %% Stuff for index of notation
 %% FYI, the nomencl package seems to discard entries containing '@' 
 \usepackage[intoc,refpage]{nomencl}
@@ -21,7 +12,9 @@
           \ifthenelse{\equal{#1}{E}}{\item[\textbf{Built-in functions}]}{%
             \ifthenelse{\equal{#1}{F}}{\item[\textbf{Term reduction}]}{%
               \ifthenelse{\equal{#1}{G}}{\item[\textbf{CEK machine}]}{%
-                \ifthenelse{\equal{#1}{H}}{\item[\textbf{Serialisation and deserialisation}]}{}
+               \ifthenelse{\equal{#1}{H}}{\item[\textbf{Concrete syntax}]}{%
+                 \ifthenelse{\equal{#1}{I}}{\item[\textbf{Serialisation and deserialisation}]}{}
+                }
               }
             }
           }
@@ -32,6 +25,15 @@
 }
 \makenomenclature
 
+\renewcommand{\thefootnote}{\fnsymbol{footnote}}
+
+% correct bad hyphenation here
+\hyphenation{byte-string}
+
+\usepackage[hyphens]{url}
+\usepackage[colorlinks=true,linkcolor=MidnightBlue,citecolor=ForestGreen,urlcolor=Plum,pageanchor]{hyperref}
+% ^ pageanchor is for nomencl
+
 \usepackage{tocbibind}  % To get the bibiliograpy in the table of contents
 \usepackage{blindtext, graphicx}
 %\usepackage[numbers]{natbib}

diff --git a/doc/plutus-core-spec/notation.tex b/doc/plutus-core-spec/notation.tex
@@ -5,70 +5,88 @@ \subsection{Sets}
 \label{sec:notation-sets}
 \begin{itemize}
   \item The symbol $\disj$ denotes a disjoint union of sets;  for emphasis we often use this
-    to denote the union of sets which we know to be disjoint.
+    to denote the union of sets which we know to be disjoint.%
+    \nomenclature[Azz]{$\disj$}{Disjoint union of sets}%
+
   \item Given a set $X$, $X^*$ denotes the set of finite sequences of elements of $X$:
     $$
     X^*= \bigdisj{\{X^n: n \in \mathbb{N}\}}.
-    $$
-  \item $\N = \{0,1,2,3,\ldots\}$.
-  \item $\Nplus = \{1,2,3,\ldots\}$.
-  \item $\Nab{a}{b} = \{n \in \N: a \leq n \leq b\}$.
-  \item $\B = \Nab{0}{255}$, the set of 8-bit bytes.
-  \item $\B^*$ is the set of all bytestrings.
-  \item $\Z = \{\ldots, -2, -1, 0, 1, 2, \ldots\}$.
-  \item $\mathbb{F}_q$ denotes a finite field with $q$ elements ($q$ a prime power).
-  \item $\units{\mathbb{F}} = \mathbb{F}\setminus\{0\}$ denotes the multiplicative group of nonzero elements of a field $\mathbb{F}$.
-  \item $\U$ denotes the set of Unicode scalar values, as defined in~\cite[Definition D76]{Unicode-standard}.
-  \item $\U^*$ is the set of all Unicode strings.
+    $$%
+    \nomenclature[Azz]{$X^*$}{The set of all finite sequences of elements of a set $X$}
+  \item $\N = \{0,1,2,3,\ldots\}$.%
+    \nomenclature[An1]{$\N$}{$\{0,1,2,3,\ldots\}$}
+
+  \item $\Nplus = \{1,2,3,\ldots\}$.%
+    \nomenclature[An2]{$\Nplus$}{$\{1,2,3,\ldots\}$}
+
+  \item $\Nab{a}{b} = \{n \in \N: a \leq n \leq b\}$.%
+    \nomenclature[An3]{$\Nab{a}{b}$}{$\{n \in \N: a \leq n \leq b\}$}
+
+  \item $\B = \Nab{0}{255}$, the set of 8-bit bytes.%
+    \nomenclature[Ab1]{$\B$}{$\{n \in \Z: 0 \leq n \leq 255\}$}%
+
+  \item $\B^*$ is the set of all bytestrings.%
+    \nomenclature[Ab2]{$\B^*$}{The set of all bytestrings}
+
+  \item $\Z = \{\ldots, -2, -1, 0, 1, 2, \ldots\}$.%
+    \nomenclature[Aw]{$\Z$}{$\{\ldots, -2, -1, 0, 1, 2, \ldots\}$}
+
+  \item $\mathbb{F}_q$ denotes a finite field with $q$ elements ($q$ a prime power).%
+    \nomenclature[Af]{$\mathbb{F}_q$}{The finite field with $q$ elements}
+
+  \item $\mathbb{F}_q^*$ denotes the multiplicative group of nonzero elements of $\mathbb{F}_q$.%
+    \nomenclature[Af]{$\mathbb{F}_q^*$}{The multiplicative group of $\mathbb{F}_q$}%
+
+  \item $\U$ denotes the set of Unicode scalar values, as defined in~\cite[Definition D76]{Unicode-standard}.%
+    \nomenclature[Au1]{$\U$}{The set of Unicode scalar values}%
+
+  \item $\U^*$ is the set of all Unicode strings.%
+    \nomenclature[Au2]{$\U^*$}{The set of Unicode strings}%
+
   \item We assume that there is a special symbol $\errorX$ which does not appear
     in any other set we mention.  The symbol $\errorX$ is used to indicate that
     some sort of error condition has occurred, and we will often need to consider
     situations in which a value is either $\errorX$ or a member of some set $S$.
     For brevity, if $S$ is a set then we define
     $$
     \withError{S} := S \disj \{\errorX\}.
-    $$
+    $$%
+    \nomenclature[Ax]{$\withError{S}$}{$S \disj \{\errorX\}$ ($S$ a set)}%
 \end{itemize}%
-\nomenclature[An1]{$\N$}{$\{0,1,2,3,\ldots\}$}%
-\nomenclature[An2]{$\Nplus$}{$\{1,2,3,\ldots\}$}%
-\nomenclature[An3]{$\Nab{a}{b}$}{$\{n \in \N: a \leq n \leq b\}$}%
-\nomenclature[Aw]{$\Z$}{$\{\ldots, -2, -1, 0, 1, 2, \ldots\}$}%
-\nomenclature[Af]{$\mathbb{F}_q$}{The finite field with $q$ elements}%
-\nomenclature[Af]{$\units{\mathbb{F}}$}{The multiplicative group of a field $\mathbb{F}$, ie $\mathbb{F}\setminus\{0\}$}%
-\nomenclature[Ab]{$\B$}{$\{n \in \Z: 0 \leq n \leq 255\}$}%
-\nomenclature[Ab]{$\B^*$}{The set of all bytestrings}%
-\nomenclature[Au]{$\U$}{The set of Unicode scalar values}%
-\nomenclature[Au]{$\U^*$}{The set of Unicode strings}%
-\nomenclature[Ax]{$\withError{S}$}{$S \disj \{\errorX\}$ ($S$ a set)}%
-\nomenclature[Azz]{$\disj$}{Disjoint union of sets}%
-\nomenclature[Azz]{$X^*$}{The set of all finite sequences of elements of a set $X$}
 
 \subsection{Lists}
 \label{sec:notation-lists}
 \begin{itemize}
-\item  The symbol $[]$ denotes an empty list.
+\item  The symbol $[]$ denotes an empty list.%
+  \nomenclature[B1]{$[]$}{The empty list}%
+
 \item The notation $[x_m, \ldots, x_n]$ denotes a list containing the elements
-  $x_m, \ldots, x_n$.  If $m>n$ then the list is empty.
-\item The length of a list $L$ is denoted by $\length(L)$.
-\item Given two lists $L = [x_1,\ldots, x_m]$ and $L' = [y_1,\ldots, y_n]$, $L\cdot L'$ 
-denotes their concatenation  $[x_1,\ldots, x_m,$ $y_1, \ldots, y_n]$.  % Broken in the middle to keep it out of the margin.
+  $x_m, \ldots, x_n$.  If $m>n$ then the list is empty.%
+
+\item The length of a list $L$ is denoted by $\length(L)$.%
+\nomenclature[B6]{$\length(\cdot)$}{Length of a list or bytestring}
+
+\item Given two lists $L = [x_1,\ldots, x_m]$ and $L' = [y_1,\ldots, y_n]$, $L\cdot L'$
+  denotes their concatenation  $[x_1,\ldots, x_m,$ $y_1, \ldots, y_n]$.  % Broken in the middle to keep it out of the margin.%
+  \nomenclature[B2]{$L \cdot L'$}{Concatenation of lists $L$ and $L'$}%
+
 \item Given an object $x$ and a list $L = [x_1,\ldots, x_n]$,
-we denote the list $[x,x_1,\ldots, x_n]$ by $x \cons L$.
+  we denote the list $[x,x_1,\ldots, x_n]$ by $x \cons L$.%
+  \nomenclature[B3]{$x \cdot L$}{$[x] \cdot L$}%
+
 \item Given a list $L = [x_1, \ldots, x_n]$ and an object $x$,
-we denote the list $[x_1, \ldots, x_n, x]$ by $L \snoc x$.
+  we denote the list $[x_1, \ldots, x_n, x]$ by $L \snoc x$.%
+  \nomenclature[B4]{$L \cdot x$}{$L \cdot [x]$}%
+
 %%\item We say that the list $L'$ is a \textit{proper prefix} of the list
 %%  $L = [x_1, \ldots, x_n]$, and
 %%  write $L' \prec L$,  if $L' = [x_1, \ldots, x_m]$ for some $m<n$.
 \item In the special case of bitstrings (ie, lists of elements of $\{0,1\}$) we
   sometimes use notation such as \texttt{101110} to denote the list
   $[1,0,1,1,1,0]$; we use a teletype font to avoid confusion with decimal
-  numbers.  
+  numbers.
+
 \item Given a syntactic category $V$, the symbol $\repetition{V}$ denotes a
-  possibly empty list $[V_1,\ldots, V_n]$ of elements $V_i \in V$.
-\end{itemize}%
-\nomenclature[B1]{$[]$}{The empty list}%
-\nomenclature[B2]{$L \cdot L'$}{Concatenation of lists $L$ and $L'$}%
-\nomenclature[B3]{$x \cdot L$}{$[x] \cdot L$}%
-\nomenclature[B4]{$L \cdot x$}{$L \cdot [x]$}%
-\nomenclature[B5]{$\repetition{V}$}{A sequence $[V_1,\ldots, V_n]$}%
-\nomenclature[B6]{$\length(\cdot)$}{Length of a list or bytestring}
+  possibly empty list $[V_1,\ldots, V_n]$ of elements $V_i \in V$.%
+  \nomenclature[B5]{$\repetition{V}$}{A sequence $[V_1,\ldots, V_n]$}%
+\end{itemize}