Skip to content

Commit

Permalink
Specify concrete syntax for built-in lists, pairs, and data (PLT-378) (
Browse files Browse the repository at this point in the history
…#5692)

* Specification of concrete syntax for list, pair, and data

* Tidying up

* Update syntax in examples

* More accurate page numbers in symbol index

* Fix formatting

* Add note about discrepancy between syntax of data in specification and implementation
  • Loading branch information
kwxm authored Jan 3, 2024
1 parent f734591 commit c776fe7
Show file tree
Hide file tree
Showing 9 changed files with 202 additions and 103 deletions.
3 changes: 2 additions & 1 deletion doc/plutus-core-spec/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,10 @@ ${DOC}.pdf: ${SRC} ${BIB}
${BIBTEX} ${DOC}
${LATEX} ${DOC} # to make sure the (cross)references are correct
${MAKEINDEX} ${DOC}.nlo -s nomencl.ist -o ${DOC}.nls
# ^ Putting this later seems to get page references out of sync.
${LATEX} ${DOC}
${MAKEINDEX} ${DOC}.nlo -s nomencl.ist -o ${DOC}.nls
${LATEX} ${DOC}
${MAKEINDEX} ${DOC}.nlo -s nomencl.ist -o ${DOC}.nls
${LATEX} ${DOC}

figs:
Expand Down
85 changes: 58 additions & 27 deletions doc/plutus-core-spec/cardano/builtins1.tex
Original file line number Diff line number Diff line change
Expand Up @@ -57,31 +57,25 @@ \subsubsection{Built-in types and type operators}
Unicode replacement character \texttt{U+FFFD}.


\paragraph{Concrete syntax for higher-order types.} Types such as $\listOf{\ty{integer}}$
and $\pairOf{\ty{bool}}{\ty{string)}}$ are represented by application at the
type level, thus: \texttt{[(con list) (con integer)]} and\texttt{[(con pair)
(con bool) (con string)]}. Each higher-order type will need further syntax
for representing constants of those types. For example, we might use
\texttt{[]} for list values and \texttt{(,)} for pairs, so the list $[11,22,33]$
might be written as
\begin{verbatim}
(con [(con list) (con integer)]
[(con integer 11), (con integer 22), (con integer 33)]
)
\end{verbatim}
and the pair (True, "Plutus") as
\paragraph{Concrete syntax for lists and pairs.}
A list of type $\texttt{list}(t)$ is written as a syntactic list
\texttt{[$c_1, \ldots, c_n$]} where each $c_i$ lies in $\bitc_t$; a pair of type
$\texttt{pair}(t_1,t_2)$ is written as a syntactic pair $\texttt{(}c_1,c_2\texttt{)}$
with $c_1 \in \bitc_{t_1}$ and $c_2 \in \bitc_{t_2}$. Some valid constant expressions
are thus

\begin{verbatim}
(con [(con pair) (con bool) (con string)]
((con bool True), (con string "Plutus"))
).
(con (list integer) [11, 22, 33])
(con (pair bool string) (True, "Plutus")).
(con (list (pair bool (list bytestring)))
[(True, []), (False, [#,#1F]), (True, [#123456, #AB, #ef2804])])
\end{verbatim}
Note however that this syntax is not currently supported by most Plutus Core tools at the time of writing.



\paragraph{The $\ty{data}$ type.}
We provide a built-in type $\ty{data}$ which permits the encoding of simple data structures
for use as arguments to Plutus Core scripts. This type is defined in Haskell as
We provide a built-in type $\ty{data}$ which permits the encoding of simple data
structures for use as arguments to Plutus Core scripts. This type is defined in
Haskell as
\begin{alltt}
data Data =
Constr Integer [Data]
Expand Down Expand Up @@ -144,16 +138,53 @@ \subsubsection{Built-in types and type operators}
the former is part of a built-in type whereas the latter is part of the language
itself.

\newcommand{\syn}[1]{c_{\mathtt{{#1}}}}

\paragraph{Concrete syntax for $\ty{data}$.}
The concrete syntax for $\ty{data}$ is given by

\begin{minipage}{0.6\linewidth}
\centering
\[\begin{array}{rcl}
\syn{data} & ::= & \texttt{(Constr} \ \syn{integer} \ \syn{list(data)} \texttt{)}\\
& & \texttt{(Map} \ \syn{list(pair(data,data))} \texttt{)}\\
& & \texttt{(List} \ \syn{list(data)} \texttt{)}\\
& & \texttt{(I} \ \syn{integer} \texttt{)}\\
& & \texttt{(B} \ \syn{bytestring} \texttt{)}.
\end{array}\]
\label{fig:data-concrete-syntax}
\end{minipage}

\noindent We interpret these syntactic constants as elements of $\denote{\ty{data}}$ using
the various `$\inj$' functions defined earlier. Some valid \texttt{data}
constants are

\begin{verbatim}
(con data (Constr 1 [(I 2), (B #), (Map [])])
(con data (Map [((I 0), (B #00)), ((I 1), (B #0F))]))
(con data (List [(I 0), (I 1), (B #7FFF), (List []])))
(con data (I -22))
(con data (B #001A)).
\end{verbatim}
% TODO: be more relaxed about parenthesisation in general

\paragraph{Note.} At the time of writing the syntax accepted by IOG's parser for textual Plutus Core
differs slightly from that above in that subobjects
of \texttt{Constr}, \texttt{Map} and \texttt{List} objects must \textit{not} be
parenthesised: for example one must write \verb|(con data (Constr 1 [I 2, B #,Map []])|.
This discrepancy will be resolved in the near future.


\subsubsection{Built-in functions}
\label{sec:built-in-functions-1}
The first batch of built-in functions is shown in
Table~\ref{table:built-in-functions-1}. The table indicates which
functions can fail during execution, and conditions causing failure are
specified either in the denotation given in the table or in a relevant note.
Recall also that a built-in function will fail if it is given an argument of the
wrong type: this is checked in conditions involving the $\sim$ relation and the
$\Eval$ function in Figures~\ref{fig:untyped-term-reduction}
and~\ref{fig:untyped-cek-machine}. Note also that some of the functions are
Table~\ref{table:built-in-functions-1}. The table indicates which functions can
fail during execution, and conditions causing failure are specified either in
the denotation given in the table or in a relevant note. Recall also that a
built-in function will fail if it is given an argument of the wrong type: this
is checked in conditions involving the $\sim$ relation and the $\Eval$ function
in Figures~\ref{fig:untyped-term-reduction} and~\ref{fig:untyped-cek-machine}.
Note also that some of the functions are
\#-polymorphic. According to Section~\ref{sec:builtin-denotations} we
require a denotation for every possible monomorphisation of these; however all
of these functions are parametrically polymorphic so to simplify notation we
Expand Down
58 changes: 52 additions & 6 deletions doc/plutus-core-spec/cardano/cardano.tex
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,22 @@ \section{Built-in types and functions}
\paragraph{Built-in batches.}
\label{sec:builtin-batches}

The built-in types and functions are defined in batches corresponding to how they were added to ledger languages.
These batches are given in the following sections.
The built-in types and functions are defined in batches corresponding to how
they were added to ledger languages. These batches are given in the following
sections.

\paragraph{Built-in semantics variants.}
\label{sec:builtin-semantics-variants}

In rare cases we can make a mistake or need to change the actual behaviour of a built-in function.
To handle this we define a series of built-in semantics variants, which indicate which behaviour should be used.
A fix will typically be deployed by defining a new semantics variant, and then using that variant for future ledger languages (but not existing ones, since this is usually a backwards-incompatible change).
In rare cases we can make a mistake or need to change the actual behaviour of a
built-in function. To handle this we define a series of built-in semantics
variants, which indicate which behaviour should be used. A fix will typically
be deployed by defining a new semantics variant, and then using that variant for
future ledger languages (but not existing ones, since this is usually a
backwards-incompatible change).

Changes are listed alongside the original definition of the built-in function in its original batch, and are indexed in the following table.
Changes are listed alongside the original definition of the built-in function in
its original batch, and are indexed in the following table.

\begin{table}[H]
\centering
Expand All @@ -31,7 +36,48 @@ \section{Built-in types and functions}
\label{table:bs-variants}
\end{table}


\paragraph{Concrete syntax for built-in types.}
Recall that in the abstract notation for built-in types introduced in
Section~\ref{sec:built-in-types}, a built-in type is either an \textit{atomic
type} such as \texttt{integer} or \texttt{string} or an application $\op(T_1,
\ldots, T_n)$ of a \textit{type operator} to a sequence of built-in types. The
concrete syntax of built-in types used in textual Plutus Core programs is
slightly different in that we use a curried form of application for type
operators: a type is given by

\newcommand{\bitn}{\mathbf{T}} % built-in type name
\newcommand{\bitc}{\mathbf{C}} % built-in constant syntax

\begin{minipage}{\linewidth}
\centering
\[\begin{array}{rclr}
\bitn & ::= & \textit{atomic-type} & \textrm{Atomic type}\\
& & \texttt{(}\op \ \bitn_1 \ldots \bitn_{\valency{\op}}) & \textrm{Type application}\\
\end{array}\]
\label{fig:built-in-type-concrete-syntax}
\end{minipage}
\noindent Note that we again require that all type operators are fully applied.
We refer to the syntactic objects $\bitn$ above as \textit{concrete built-in
types}. There is an obvious bijection between these and the abstract built-in
types used elsewhere in this document, and given an abstract built-in type $T$
we will denote the corresponding concrete built-in type by $\bar{T}$.

\paragraph{Concrete syntax for built-in constants.}
We provide concrete syntax for constants of most (but not all) built-in types.
For a built-in type $T$ which has a concrete syntax we specify a set $\bitc_T$
of strings (using either regular expressions or a BNF-style grammar), and a
constant of type $T$ is then represented in the concrete syntax by an expression
of the form \texttt{(con $\bar{T}$ $c_T$)} with $c_T \in \bitc_T$. Each string
$c_T$ will have an interpretation as a value of type $T$ (ie, an element of
$\denote{T}$) and since this will generally be the obvious interpretation we
will not always spell out the details.%%
\nomenclature[H]{$\bitc_T$}{Set of strings used for the concrete syntax of constants of built-in type $T$}


\input{cardano/builtins1.tex}
\input{cardano/builtins2.tex}
\input{cardano/builtins3.tex}
\input{cardano/builtins4.tex}


4 changes: 2 additions & 2 deletions doc/plutus-core-spec/data-cbor.tex
Original file line number Diff line number Diff line change
Expand Up @@ -61,12 +61,12 @@ \section{Notation}
$$
\e_X : X \rightarrow \B^*
$$%
\nomenclature[HC]{$\e_X$}{CBOR encoder for \texttt{data}}
\nomenclature[IC]{$\e_X$}{CBOR encoder for \texttt{data}}
and decoding functions (or \textit{decoders})
$$
\d_X : \B^* \rightharpoonup \B^* \times X
$$%
\nomenclature[HC]{$\d_X$}{CBOR decoder for \texttt{data}}
\nomenclature[IC]{$\d_X$}{CBOR decoder for \texttt{data}}

\noindent for various sets $X$, such as the set $\Z$ of integers and the set of
all \texttt{data} items. The encoding function $\e_X$ takes an element $x \in
Expand Down
4 changes: 2 additions & 2 deletions doc/plutus-core-spec/flat-serialisation.tex
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,12 @@ \section{Encoding and decoding}
$$
\E_X : \Bits \times X \rightarrow \Bits
$$%
\nomenclature[HF]{$\E_X$}{Flat encoder}
\nomenclature[IF]{$\E_X$}{Flat encoder}
and (partial) decoding functions (or \textit{decoders})
$$
\D_X : \Bits \rightharpoonup \Bits \times X
$$%
\nomenclature[HF]{$\D_X$}{Flat decoder}
\nomenclature[IF]{$\D_X$}{Flat decoder}

\noindent for various sets $X$, such as the set $\Z$ of integers and the set of
all Plutus Core terms. The encoding function $\E_X$ takes a sequence $s \in
Expand Down
22 changes: 12 additions & 10 deletions doc/plutus-core-spec/header.tex
Original file line number Diff line number Diff line change
@@ -1,12 +1,3 @@
\renewcommand{\thefootnote}{\fnsymbol{footnote}}

% correct bad hyphenation here
\hyphenation{byte-string}

\usepackage[hyphens]{url}
\usepackage[colorlinks=true,linkcolor=MidnightBlue,citecolor=ForestGreen,urlcolor=Plum,pageanchor]{hyperref}
% ^ pageanchor is for nomencl

%% Stuff for index of notation
%% FYI, the nomencl package seems to discard entries containing '@'
\usepackage[intoc,refpage]{nomencl}
Expand All @@ -21,7 +12,9 @@
\ifthenelse{\equal{#1}{E}}{\item[\textbf{Built-in functions}]}{%
\ifthenelse{\equal{#1}{F}}{\item[\textbf{Term reduction}]}{%
\ifthenelse{\equal{#1}{G}}{\item[\textbf{CEK machine}]}{%
\ifthenelse{\equal{#1}{H}}{\item[\textbf{Serialisation and deserialisation}]}{}
\ifthenelse{\equal{#1}{H}}{\item[\textbf{Concrete syntax}]}{%
\ifthenelse{\equal{#1}{I}}{\item[\textbf{Serialisation and deserialisation}]}{}
}
}
}
}
Expand All @@ -32,6 +25,15 @@
}
\makenomenclature

\renewcommand{\thefootnote}{\fnsymbol{footnote}}

% correct bad hyphenation here
\hyphenation{byte-string}

\usepackage[hyphens]{url}
\usepackage[colorlinks=true,linkcolor=MidnightBlue,citecolor=ForestGreen,urlcolor=Plum,pageanchor]{hyperref}
% ^ pageanchor is for nomencl

\usepackage{tocbibind} % To get the bibiliograpy in the table of contents
\usepackage{blindtext, graphicx}
%\usepackage[numbers]{natbib}
Expand Down
102 changes: 60 additions & 42 deletions doc/plutus-core-spec/notation.tex
Original file line number Diff line number Diff line change
Expand Up @@ -5,70 +5,88 @@ \subsection{Sets}
\label{sec:notation-sets}
\begin{itemize}
\item The symbol $\disj$ denotes a disjoint union of sets; for emphasis we often use this
to denote the union of sets which we know to be disjoint.
to denote the union of sets which we know to be disjoint.%
\nomenclature[Azz]{$\disj$}{Disjoint union of sets}%

\item Given a set $X$, $X^*$ denotes the set of finite sequences of elements of $X$:
$$
X^*= \bigdisj{\{X^n: n \in \mathbb{N}\}}.
$$
\item $\N = \{0,1,2,3,\ldots\}$.
\item $\Nplus = \{1,2,3,\ldots\}$.
\item $\Nab{a}{b} = \{n \in \N: a \leq n \leq b\}$.
\item $\B = \Nab{0}{255}$, the set of 8-bit bytes.
\item $\B^*$ is the set of all bytestrings.
\item $\Z = \{\ldots, -2, -1, 0, 1, 2, \ldots\}$.
\item $\mathbb{F}_q$ denotes a finite field with $q$ elements ($q$ a prime power).
\item $\units{\mathbb{F}} = \mathbb{F}\setminus\{0\}$ denotes the multiplicative group of nonzero elements of a field $\mathbb{F}$.
\item $\U$ denotes the set of Unicode scalar values, as defined in~\cite[Definition D76]{Unicode-standard}.
\item $\U^*$ is the set of all Unicode strings.
$$%
\nomenclature[Azz]{$X^*$}{The set of all finite sequences of elements of a set $X$}
\item $\N = \{0,1,2,3,\ldots\}$.%
\nomenclature[An1]{$\N$}{$\{0,1,2,3,\ldots\}$}

\item $\Nplus = \{1,2,3,\ldots\}$.%
\nomenclature[An2]{$\Nplus$}{$\{1,2,3,\ldots\}$}

\item $\Nab{a}{b} = \{n \in \N: a \leq n \leq b\}$.%
\nomenclature[An3]{$\Nab{a}{b}$}{$\{n \in \N: a \leq n \leq b\}$}

\item $\B = \Nab{0}{255}$, the set of 8-bit bytes.%
\nomenclature[Ab1]{$\B$}{$\{n \in \Z: 0 \leq n \leq 255\}$}%

\item $\B^*$ is the set of all bytestrings.%
\nomenclature[Ab2]{$\B^*$}{The set of all bytestrings}

\item $\Z = \{\ldots, -2, -1, 0, 1, 2, \ldots\}$.%
\nomenclature[Aw]{$\Z$}{$\{\ldots, -2, -1, 0, 1, 2, \ldots\}$}

\item $\mathbb{F}_q$ denotes a finite field with $q$ elements ($q$ a prime power).%
\nomenclature[Af]{$\mathbb{F}_q$}{The finite field with $q$ elements}

\item $\mathbb{F}_q^*$ denotes the multiplicative group of nonzero elements of $\mathbb{F}_q$.%
\nomenclature[Af]{$\mathbb{F}_q^*$}{The multiplicative group of $\mathbb{F}_q$}%

\item $\U$ denotes the set of Unicode scalar values, as defined in~\cite[Definition D76]{Unicode-standard}.%
\nomenclature[Au1]{$\U$}{The set of Unicode scalar values}%

\item $\U^*$ is the set of all Unicode strings.%
\nomenclature[Au2]{$\U^*$}{The set of Unicode strings}%

\item We assume that there is a special symbol $\errorX$ which does not appear
in any other set we mention. The symbol $\errorX$ is used to indicate that
some sort of error condition has occurred, and we will often need to consider
situations in which a value is either $\errorX$ or a member of some set $S$.
For brevity, if $S$ is a set then we define
$$
\withError{S} := S \disj \{\errorX\}.
$$
$$%
\nomenclature[Ax]{$\withError{S}$}{$S \disj \{\errorX\}$ ($S$ a set)}%
\end{itemize}%
\nomenclature[An1]{$\N$}{$\{0,1,2,3,\ldots\}$}%
\nomenclature[An2]{$\Nplus$}{$\{1,2,3,\ldots\}$}%
\nomenclature[An3]{$\Nab{a}{b}$}{$\{n \in \N: a \leq n \leq b\}$}%
\nomenclature[Aw]{$\Z$}{$\{\ldots, -2, -1, 0, 1, 2, \ldots\}$}%
\nomenclature[Af]{$\mathbb{F}_q$}{The finite field with $q$ elements}%
\nomenclature[Af]{$\units{\mathbb{F}}$}{The multiplicative group of a field $\mathbb{F}$, ie $\mathbb{F}\setminus\{0\}$}%
\nomenclature[Ab]{$\B$}{$\{n \in \Z: 0 \leq n \leq 255\}$}%
\nomenclature[Ab]{$\B^*$}{The set of all bytestrings}%
\nomenclature[Au]{$\U$}{The set of Unicode scalar values}%
\nomenclature[Au]{$\U^*$}{The set of Unicode strings}%
\nomenclature[Ax]{$\withError{S}$}{$S \disj \{\errorX\}$ ($S$ a set)}%
\nomenclature[Azz]{$\disj$}{Disjoint union of sets}%
\nomenclature[Azz]{$X^*$}{The set of all finite sequences of elements of a set $X$}

\subsection{Lists}
\label{sec:notation-lists}
\begin{itemize}
\item The symbol $[]$ denotes an empty list.
\item The symbol $[]$ denotes an empty list.%
\nomenclature[B1]{$[]$}{The empty list}%

\item The notation $[x_m, \ldots, x_n]$ denotes a list containing the elements
$x_m, \ldots, x_n$. If $m>n$ then the list is empty.
\item The length of a list $L$ is denoted by $\length(L)$.
\item Given two lists $L = [x_1,\ldots, x_m]$ and $L' = [y_1,\ldots, y_n]$, $L\cdot L'$
denotes their concatenation $[x_1,\ldots, x_m,$ $y_1, \ldots, y_n]$. % Broken in the middle to keep it out of the margin.
$x_m, \ldots, x_n$. If $m>n$ then the list is empty.%

\item The length of a list $L$ is denoted by $\length(L)$.%
\nomenclature[B6]{$\length(\cdot)$}{Length of a list or bytestring}

\item Given two lists $L = [x_1,\ldots, x_m]$ and $L' = [y_1,\ldots, y_n]$, $L\cdot L'$
denotes their concatenation $[x_1,\ldots, x_m,$ $y_1, \ldots, y_n]$. % Broken in the middle to keep it out of the margin.%
\nomenclature[B2]{$L \cdot L'$}{Concatenation of lists $L$ and $L'$}%

\item Given an object $x$ and a list $L = [x_1,\ldots, x_n]$,
we denote the list $[x,x_1,\ldots, x_n]$ by $x \cons L$.
we denote the list $[x,x_1,\ldots, x_n]$ by $x \cons L$.%
\nomenclature[B3]{$x \cdot L$}{$[x] \cdot L$}%

\item Given a list $L = [x_1, \ldots, x_n]$ and an object $x$,
we denote the list $[x_1, \ldots, x_n, x]$ by $L \snoc x$.
we denote the list $[x_1, \ldots, x_n, x]$ by $L \snoc x$.%
\nomenclature[B4]{$L \cdot x$}{$L \cdot [x]$}%

%%\item We say that the list $L'$ is a \textit{proper prefix} of the list
%% $L = [x_1, \ldots, x_n]$, and
%% write $L' \prec L$, if $L' = [x_1, \ldots, x_m]$ for some $m<n$.
\item In the special case of bitstrings (ie, lists of elements of $\{0,1\}$) we
sometimes use notation such as \texttt{101110} to denote the list
$[1,0,1,1,1,0]$; we use a teletype font to avoid confusion with decimal
numbers.
numbers.

\item Given a syntactic category $V$, the symbol $\repetition{V}$ denotes a
possibly empty list $[V_1,\ldots, V_n]$ of elements $V_i \in V$.
\end{itemize}%
\nomenclature[B1]{$[]$}{The empty list}%
\nomenclature[B2]{$L \cdot L'$}{Concatenation of lists $L$ and $L'$}%
\nomenclature[B3]{$x \cdot L$}{$[x] \cdot L$}%
\nomenclature[B4]{$L \cdot x$}{$L \cdot [x]$}%
\nomenclature[B5]{$\repetition{V}$}{A sequence $[V_1,\ldots, V_n]$}%
\nomenclature[B6]{$\length(\cdot)$}{Length of a list or bytestring}
possibly empty list $[V_1,\ldots, V_n]$ of elements $V_i \in V$.%
\nomenclature[B5]{$\repetition{V}$}{A sequence $[V_1,\ldots, V_n]$}%
\end{itemize}
Loading

1 comment on commit c776fe7

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Plutus Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.05.

Benchmark suite Current: c776fe7 Previous: f734591 Ratio
validation-decode-auction_1-2 553.2 μs 515.8 μs 1.07

This comment was automatically generated by workflow using github-action-benchmark.

CC: @input-output-hk/plutus-core

Please sign in to comment.