You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I realized that some terminology isn't as consistent as it could be in the spec. Let us first acknowledge that the following two concepts are central to the spec and distinct from each other:
The (type of the) data Substrait plans work on. Possible terms: dataset, table, relation, ...
The computations Substrait plans do. Possible terms: transformations, (relational) operators, (relational) operations, relations, ...
For example, the doc on "relation basics" says (annotation is mine):
Substrait is designed to allow a user to describe arbitrarily complex data transformations. These transformations are composed of one or more relational operations. Relational operations are well-defined transformation operations that work by taking zero or more input datasets and transforming them into zero or more output transformations [should be: datasets].
I think several points are less than perfect:
The spec uses too many terms for the computations: "transformations," "data transformations," "relational operations," and "transformation operations" just in these few lines, then "relations" as the core concept of the spec but "relational operators" as the section heading in the "basics" section and a mix of "operator" and "operation" for the headings of the individual relations (e.g., "Aggregate Operation" but "Reference Operator").
The spec uses "relation" for the computation -- whereas in all other places that I know, the word "relation" refers to the data.
I suggest we clean up the spec to make things easier to understand. What are peoples' preferred terms for the two concepts?
I feel pretty strongly about not using "relation" for the computation but expect a lot of headwind against changing that term at this point and have mixed feelings myself: that term has made it into the protobuf definition, from which we can't remove it, and using it there but not in the prose is also a source of potential confusion.
The text was updated successfully, but these errors were encountered:
I realized that some terminology isn't as consistent as it could be in the spec. Let us first acknowledge that the following two concepts are central to the spec and distinct from each other:
For example, the doc on "relation basics" says (annotation is mine):
I think several points are less than perfect:
I suggest we clean up the spec to make things easier to understand. What are peoples' preferred terms for the two concepts?
I feel pretty strongly about not using "relation" for the computation but expect a lot of headwind against changing that term at this point and have mixed feelings myself: that term has made it into the protobuf definition, from which we can't remove it, and using it there but not in the prose is also a source of potential confusion.
The text was updated successfully, but these errors were encountered: