Moving from a list of nodes to a tree #2403

matthew-carroll · 2024-11-09T19:26:56Z

matthew-carroll
Nov 9, 2024
Maintainer

We have some features that we want to implement, which imply a document hierarchy where currently there is none.

For example, we want to support tables, where each cell contains any number of other nodes. We also want to make it possible for clients to create the concept of "banners", which are decorated boxes that contain any number of other nodes.

These features impact painting, layout, and UX. E.g.,

Layout: A table has its own layout, which displays rows and columns.
Paint: A table paints borders and (maybe) headers and drag handles.
UX: Pressing the down arrow at the bottom of a cell moves the caret to the cell immediately below it.

These requirements seem to point towards the need for a tree-based document, instead of a list-based document. This requirement isn't a surprise. Most document formats are structured as trees, e.g., XML, HTML, Markdown. However, we wanted to take our simplistic document list as far as we could before investing in the complexity of a tree-based document. Along those lines, if there's a reasonable path to implement these requirements without moving away from a document list, I'd like to hear about it.

API Problems

By leaning into a document list instead of a document tree, we've created a number of heavily used APIs that don't necessarily make sense in a tree.

Document APIs:

node iterator API
firstOrNull, lastOrNull
getNodeAt(index)
getNodeIndex(node)
getNodeIndexById(nodeId)
getNodeBefore(node)
getNodeAfter(node)

Edit requests:

InsertNodeAtIndexRequest
InsertNodeBeforeNodeRequest
InsertNodeAfterNodeRequest

In other words, all APIs that imply a natural order to nodes are now ambiguous, at best.

We can, in theory, continue to support these APIs by selecting a single tree traversal policy. For example, we can say that we'll always traverse the tree in a depth-first order. This would retain ordering, and would allow us to speak about each node as sitting at a specific index. However, it's unclear if any of the existing uses of the aforementioned APIs would work as expected with tree traversal indices. For example, the composite nodes themselves would be traversed, and would receive an index. Should composite nodes interact with UX policies the same way as a paragraph? Does it make any sense for the caret to placed within a composite node? Probably not.

Comparisons

HTML

HTML is probably the most popular document format due to its use in every webpage. HTML is based on the DOM, which is comprised of a tree of Elements:

Nodes may be strictly organizational, providing a means for grouping other nodes together or for providing a point at which a hierarchy can be constructed; other nodes may represent visible components of a document. Each node is based on the Node interface, which provides properties for getting information about the node as well as methods for creating, deleting, and organizing nodes within the DOM.

Nodes don't have any concept of including the content that is actually displayed in the document. They're empty vessels. The fundamental notion of a node that can represent visual content is introduced by the Element interface. An Element object instance represents a single element in a document created using either HTML or an XML vocabulary such as SVG.

https://developer.mozilla.org/en-US/docs/Web/API/HTML_DOM_API

HTML Elements are notoriously annoying to work with. The use of this tree structure eventually lead to the advent of JQuery - a collection of functions for more easily querying and mutating the DOM.

jQuery is a fast, small, and feature-rich JavaScript library. It makes things like HTML document traversal and manipulation, event handling, animation, and Ajax much simpler with an easy-to-use API that works across a multitude of browsers. With a combination of versatility and extensibility, jQuery has changed the way that millions of people write JavaScript.

https://jquery.com/

Markdown

Markdown is a very simple markup language, which is used in many text editors today. This discussion post is written in Markdown.

While the syntax is minimal, the structure remains hierarchical. I don't know the exact reason that Markdown went hierarchical, but one important detail about Markdown is that it officially supports embedding HTML within it. Therefore, Markdown was probably constrained by HTML's existing decision.

Here's a random specification and implementation of a Markdown Abstract Syntax Tree (AST): https://github.com/syntax-tree/mdast

DocX

DocX is Microsoft's document format for Word. It's serialization format is XML, so it's a hierarchy that's similar in nature to HTML.

LaTeX

LaTeX is a document format that's used heavily in academia. It also uses a hierarchical structure.

Example:

\begin{figure}[h]
  \centering
  \includegraphics[width=0.5\textwidth]{image.png}
  \caption{Sample image}
\end{figure}

\begin{table}[h]
  \centering
  \begin{tabular}{|c|c|}
    \hline
    Column 1 & Column 2 \\
    \hline
    Data 1 & Data 2 \\
    \hline
  \end{tabular}
  \caption{Sample table}
\end{table}

Non-Tree Document Formats

In a quick search for popular document formats that aren't tree-based, the only formats I found are Comma-Separated Values (CSV), and Rich-Text Format (RTF). Both of these formats have a far more narrow application than what Super Editor needs to support.

Should we move to a tree structure?

The first question we need to align on is wether a tree-based document is the right move. It will be a significant move, and will almost certainly require significant rework among our users. If anyone has good alternative suggestions, I'd like to hear them.

Should we separate logical nodes from visual nodes?

Currently, every node in a Document has a visual purpose: paragraph, image, horizontal rule, list item, task.

In a world with hierarchy, we need to decide whether we should use a special type of node for hierarchy (CompositeNode), or whether we start from the premise that any node might have children.

matthew-carroll · 2024-11-09T19:31:51Z

matthew-carroll
Nov 9, 2024
Maintainer Author

CC @miguelcmedeiros @brian-superlist @knopp @Jethro87 @jmatth @aloisdeniel @BazinC @jezell - I'd appreciate your thoughts on this possible shift in Document structure. I'd like to collect all available insights before pulling the trigger on anything.

0 replies

JostSchenck · 2024-11-11T15:29:57Z

JostSchenck
Nov 11, 2024

I hope you don't mind me chiming in, but this is something I am very interested in. I've been working on creating an outline editing package based on super_editor in the last weeks, and this would make things much easier. With my work based on a recent super_editor from main branch, I decided to go for an OutlineTreeDocument that implements the MutableDocument interface, working with a tree structure of nodes (each OutlineTreenode holding an arbitrary number DocumentNodes) that can be addressed either by ID or by a path, just like you do in that current WIP. Every tree node as well as the whole OutlineTreeDocument is an Iterable over the DocumentNodes it consists of. Obviously, this approach meant I had to find sensible ways of implementing those MutableDocument methods which just don't translate well to a tree structure (like inserting a DocumentNode at some index, when that position happens to be just between two tree nodes). While things do work out for my needs right now, this library would obviously greatly benefit from a tree structure, and I would gladly rework it completely when tree structures are a thing in super_editor.

To illustrate this, here a short window of the editor. My notion of a Treenode means a data structure that contains a title DocumentNode and zero or more ParagraphNodes. As the DocumentNodes are linearly exposed via the MutableDocument interface, the core super editor library does not need any knowledge about my tree, but commands, reactions and keyboard actions do, by casting the editor's document to an OutlineTreeDocument. While the caret can be moved seamlessly between the nodes, selections are modified by a reaction to assure the user does not violate the tree structure. I also use reactions to take care of the caret jumping over hidden areas, when parts of the document are folded, as with super_editor as is I did not find a way to implement this earlier in the pipeline without having to rewrite larger parts.

outline_editor.mp4

While the distinction between a "tree node", that only defines logical document structure, and a DocumentNode for the editing experience lead me to a somewhat confusing nomenclature and may not fit for every use case, it worked out for me, as outline editing is very much a structure thing. My main project is not far enough that I could really judge if my approach holds in the long run, but it works fine enough for now.

Right now, I stopped working on the outline library, as I suspect with the advent of immutable nodes and your work on a tree structure, a bigger refactoring or rewrite will happen anyway, and because for me it does enough for now to continue on other aspects of my project.

1 reply

matthew-carroll Nov 11, 2024
Maintainer Author

Thanks for the writeup @JostSchenck - I appreciate the effort you've put in there. Hopefully we come up with something that makes things easier on your end.

aloisdeniel · 2024-11-15T20:11:59Z

aloisdeniel
Nov 15, 2024

I can see how it is definitively more complex, but in my opinion it is also required if super_editor wants to be the final boss of editors.

Implementing tables, or columns layouts with current implementation is tedious, and having a tree representation would make their implementation simply natural.

A nice implementation that can be used as reference is ProseMirror. The documentation is a great source of inspiration!

Also, a great library that should be considered is Yjs, and more specifically its Rust+FFI implementation : y-crdt. This allows real time updates from multiple sources, and smartly resolve conflicts. It solves a lot of issues, and has a tree representation.

1 reply

matthew-carroll Nov 16, 2024
Maintainer Author

ProseMirror looks interesting. There are definitely some parallels between their chosen approach and some decisions we've already made.

ProseMirror chose to represent inline nodes as a list of nodes (instead of the DOM's approach to use a tree) to better support text formatting. We have AttributedText, which is just one blob of text, but our use of format metadata within AttributedText serves a similar purpose. We still need to solve inline widgets for AttributedText. The ProseMirror concept of a "Mark" is roughly analogous to our concept of a text "attribution". The ProseMirror concept of a node "attribute" is equivalent to our use of the term node "metadata".

ProseMirror makes their nodes and document immutable. This is something we already have in progress: #2384

WRT queries, it looks like ProseMirror has embraced both tree-based queries, and a Quill-style counting mechanism:

ProseMirror nodes support two types of indexing—they can be treated as trees, using offsets into individual nodes, or they can be treated as a flat sequence of tokens.

The first allows you to do things similar to what you'd do with the DOM—interacting with single nodes, directly accessing child nodes using the child method and childCount, writing recursive functions that scan through a document (if you just want to look at all nodes, use descendants or nodesBetween).

The second is more useful when addressing a specific position in the document. It allows any document position to be represented as an integer—the index in the token sequence. These tokens don't actually exist as objects in memory—they are just a counting convention—but the document's tree shape, along with the fact that each node knows its size, is used to make by-position access cheap.

The start of the document, right before the first content, is position 0.

Entering or leaving a node that is not a leaf node (i.e. supports content) counts as one token. So if the document starts with a paragraph, the start of that paragraph counts as position 1.

Each character in text nodes counts as one token. So if the paragraph at the start of the document contains the word “hi”, position 2 is after the “h”, position 3 after the “i”, and position 4 after the whole paragraph.

Leaf nodes that do not allow content (such as images) also count as a single token.

An interesting callout on their index system:

Each node has a nodeSize property that gives you the size of the entire node, and you can access .content.size to get the size of the node's content. Note that for the outer document node, the open and close tokens are not considered part of the document (because you can't put your cursor outside of the document), so the size of a document is doc.content.size, not doc.nodeSize.

This seems to suggest that the logical node indexing and counting system is actually based on the visual question of where the caret might appear or move. That's an interesting decision because I've seen different caret position policies for the same document situation. Think about something like inline code. I've seen apps where you can position the caret at the end of the inline code, and then press right arrow, and move the caret to just beyond the inline code. I've also seen apps where you can't do that. The later apps should probably be considered buggy, but the point remains that different apps might want different rules about caret placement, so it's a bit odd that such a policy would be enshrined in the core document data model.

ProseMirror uses a concept of "transformations" to alter a document, with essentially the exact same motivation that lead us to develop our editor pipeline (requests, commands, events, reactions): https://prosemirror.net/docs/guide/#transform

The ProseMirror description of editor state nearly identically reflects the pieces we assembled early on: Editor, Document, Composer (owns selection and activated attributes):

volser · 2024-11-18T14:33:37Z

volser
Nov 18, 2024

should it be not tree, but something like custom embeds in Quill delta?
it was added into delta library
slab/delta@ae5f4a4

New table module was implemented using this feature
https://github.com/slab/quill/blob/main/packages/quill/src/modules/tableEmbed.ts

It allows embed any custom type, but the most trivial is embedded delta.
It could be used for Lists or multiline text blocks

4 replies

volser Nov 18, 2024

YJS added support as well
https://github.com/yjs/y-quill?tab=readme-ov-file#custom-embeds

matthew-carroll Nov 18, 2024
Maintainer Author

That table construct is effectively a tree - the table is a node that points to rows and columns which points to cells.

Also, Delta is a serialization format, not a runtime data structure. Quill Deltas text is parsed into one or more data structures that represent the content while the app is running. Then the app content is re-serialized to text whenever somebody wants to export the updated deltas.

If I'm wrong about any of that, please let me know.

volser Nov 19, 2024

Also, Delta is a serialization format, not a runtime data structure

it's a document format of quill editor

Quill Deltas text is parsed into one or more data structures that represent the content while the app is running

I did not get it. I don't think it's the case for quill editor, Delta is a document format for quill editor, "parchment" (blots) is presentation (DOM elements) of this format

jezell Nov 19, 2024

YJS itself also directly stores quill style deltas in Y.Text, so a lot of the editors just work straight with the deltas.

However, Y.ProseMirror uses Y.XmlFragment, which might be a little easier of an approach. They also started out with a whole editor with it's own set of concepts before supporting CRDT via plugins to get collaborative editing working out of the box, and Xml fragment was the easiest to map their structures into without reworking everything:

https://github.com/yjs/y-prosemirror

I'm a huge fan of ProseMirror. I think anything that follows in its footsteps is probably going down a good path.

jmatth · 2024-11-18T19:44:49Z

jmatth
Nov 18, 2024

This question touches on a lot of different subjects that I'm struggling to tie together in a coherent way, so I'll just summarize my opinions up front:

I think moving to a tree based document structure is the best option to support the widest array of use cases out of the box and probably the correct path to take in the long run
I think it is possible although fiddly and labor intensive to achieve most or all tree based behaviors using just the flat list of nodes in the current Document interface
I don't think incorporating yrs or similar CRDT libraries that require FFI into SuperEditor is a good idea, but doing so in a custom Document implementation is and should remain possible
ProseMirror is a good library that has seen significant success, and SuperEditor should consider using it as a reference for future additions and changes to its own API

> If anyone has good alternative suggestions, I'd like to hear them.

I think a tree structure is the right way to go to implement the features you listed, but to play devil's advocate for a second: you can get most or all of the way to each of those features without a tree.

Tables could be implemented by having a node per row, each with an AttributedText per cell. In the document layout + styling you group adjacent rows together so they appear as a single table. I had a proof of concept for this working at one point and only abandoned it because teaching the IME layer to understand the new node type would have required upstream changes I wasn't willing to put the time into.
"Banners" could be implemented by applying an Attribution to each node that needs to be part of the banner, then drawing the decoration around each contiguous set of nodes with that attribution in the style / document layout. I also had a version of this working with the intent to replace the BlockQuote node, but other work got in the way and now it would probably be a nightmare to rebase onto the current codebase.
This wasn't listed, but my app supports collapsing and expanding various "sections", such as lists with indented items or everything between headings of equal or lesser level. We do this by inferring a tree structure from the flat list of document nodes.

Regarding yjs and similar

A couple other comments have already called out the yjs/yrs libraries as a potential way to store document state. As much as I'd like sync as a natively supported feature in SuperEditor, I don't think adding a dependency on FFI is the way to go. I think a more reasonable approach is to consider that some implementations may use other data structures as their source of truth, in which case Document (or an implementation of its interface) will only be a materialized view of the current state. For example, here is what typing a character may look like in that scenario:

User presses a key, transaction is started
EditOperation manipulates a yrs document over ffi, which produces a diff
Based on the yjs diff, one or more EditOperations are generated and applied to bring the Document state in sync
Transaction finishes

This is actually possible right now, and I have a very basic and barely working proof of concept that uses Loro to store and sync two SuperEditor documents.

Regarding ProseMirror

ProseMirror was also already mentioned in another comment, so I'll just add a +1 to using it as a reference if/when SuperEditor makes significant changes to its document structure. It is successful enough in the JS space to have at least one commercialized offering built on it (TipTap, which I think also uses yjs for sync), supports an official sync/collaboration interface without being opinionated about the implementation, and has a change transaction system with an official undo/redo plugin built on top of it. It seems ideal as a document editor framework.

I don't know that I would call any of these implementations replacements for a truly tree based document structure, but it is possible to approximate such features using just the current flat list implementation.

9 replies

jmatth Nov 18, 2024

Ok. So this implies the following tradeoff...

I don't think that making it possible to integrate a custom document layout is all that disruptive on the Super Editor side, at least within the constraint that the layouts grow vertically like SingleColumnDocumentLayout does. Most of the pieces are already in the code base: DocumentLayout defines the interface that the layout widget state must satisfy and SingleColumnDocumentLayout provides a reasonable reference implementation. The only reason it's not possible to use custom layout implementations right now is because the SuperEditor widget is hardcoded to use SingleColumnDocumentLayout. Is there some implicit contract or other complexity between SuperEditor, DocumentScaffold, and SingleColumnDocumentLayout that I'm missing that makes parameterizing the layout more difficult?

All that being said, I think it would make sense to implement collapse/expand behavior upstream because there are several other parts of Super Editor that need to be updated to support it: the gesture system needs to understand that a node in a collapsed section isn't available for hit testing, moving the caret needs to either skip collapsed sections or automatically expand them, etc. These are all things that are difficult or impossible to implement without upstream changes.

To summarize: I think a collapsible sections feature should be implemented upstream because it involves complexities that are difficult or impossible to work around downstream. At the same time, SingleColumnDocumentLayout doesn't need to be all things to all users and can just be treated as a reference implementation, with users able to provide their own implementations at runtime.

Answers to such UI/UX questions can probably be answered by inspecting Notion, which supports collapsible headers that don't apply to all content below them. Rather, they apply to select blocks below them.
In general, we want to prohibit the fewest things possible. This sounds like an application decision, not a toolkit decision.

Agreed, I just wanted to call it out because it's a decision that Super Editor has already made in its default editor implementation. Moving to a tree structure and utilizing it to implement collapsing sections moves that default implementation from something that resembles a word document to something that more closely resembles a notion document. That may or may not be desirable for all downstream users, so it might be best to consider how hard it would be to opt out of the new behavior.

I don't think we can uphold the idea that we can always serialize to flat text.

My comments are purely related to the in-memory data structures and how the editor interface is presented to the user. The way I described converting the tree back to flat text was poor wording on my part. What I meant was ensuring this hypothetical document tree could be fully represented by the current UI. My second example, with a paragraph sibling to a heading, fails that criteria because SingleColumnDocumentLayout currently has no way to indicate where the heading's children end and the siblings begin. You could add that, but now we're moving the unchangeable default document layout away from something that resembles a simple word doc and closer to a notion doc etc. etc. see previous section.

jezell Nov 19, 2024

A couple other comments have already called out the yjs/yrs libraries as a potential way to store document state. As much as I'd like sync as a natively supported feature in SuperEditor, I don't think adding a dependency on FFI is the way to go.

FFI requirement really would be a drag and isn't supported at all by dart2js or dart2wasm at the moment. However, for CRDT the only reasonable path is FFI or port YJS to dart. I think there would be a ton of value in doing that because there are a lot of missing pieces like what do you do on the server side, but it's a bit of a project on its own.

jezell Nov 19, 2024

For context we currently do flutter_quill + yjs + hocuspocus. It works, but it's a lot of complexity that we'd like to get rid of at some point. We'd definitely be interested in working with others on a dart yjs port if there was interest, as that's the ideal pathway toward simplification. That said, I don't think super_editor needs a dependency on CRDT. ProseMirror is a good example of a way to make CRDT work with an existing system without baking it in.

matthew-carroll Nov 19, 2024
Maintainer Author

I'd like to avoid getting too deep into CRDTs, YJS, etc, except to ask the following question. Are any of you aware of any document structure decisions that are likely to be critical to CRDTs and YJS that can't be overcome through a translation?

In other words, can you think of anything that would satisfy the following:

If we don't do X with the document structure, we can't ever reasonably support CRDTs, YJS, etc.
If we do Y with the document structure, we can't ever reasonably support CRDTs, YJS, etc.

To help inform those questions, I want to make sure everyone understands the API that we're really talking about here.

The Document API is a query API. It's a contract that every Super Editor document implementation must be able to honor, but it doesn't fully impose any particular implementation. You can use any transport format, you can use any local cache selection (file, db, etc), you can use any server. All the Document API says is that someway, somehow, your document can be treated as a collection of nodes, and those nodes can be queried.

When it comes to altering documents, we have Editor, EditRequest, EditCommand, EditEvent, EditReaction. This things taken together are the "command" API. There are some connections between the "command" API and the "query" API - namely DocumentSelection, DocumentRange, DocumentPosition, and DocumentNode. However, the actual mutation of a document is handled by individual EditCommand implementations, which can be switched out on a per-app basis.

Thus we have a Command and Query Responsibility Segregation (CQRS), which hopefully minimized the imposition of Super Editor on the underlying document format.

alterhuman Jan 20, 2025

+1 for parameterising the layout. It doesn't interfere with current implementation in any way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moving from a list of nodes to a tree #2403

{{title}}

Replies: 5 comments 15 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Moving from a list of nodes to a tree #2403

matthew-carroll Nov 9, 2024 Maintainer

API Problems

Comparisons

HTML

Markdown

DocX

LaTeX

Non-Tree Document Formats

Should we move to a tree structure?

Should we separate logical nodes from visual nodes?

Replies: 5 comments · 15 replies

matthew-carroll Nov 9, 2024 Maintainer Author

matthew-carroll Nov 11, 2024 Maintainer Author

matthew-carroll Nov 16, 2024 Maintainer Author

matthew-carroll Nov 18, 2024 Maintainer Author

> If anyone has good alternative suggestions, I'd like to hear them.

Regarding yjs and similar

Regarding ProseMirror

matthew-carroll Nov 19, 2024 Maintainer Author

matthew-carroll
Nov 9, 2024
Maintainer

Replies: 5 comments 15 replies

matthew-carroll
Nov 9, 2024
Maintainer Author

matthew-carroll Nov 11, 2024
Maintainer Author

matthew-carroll Nov 16, 2024
Maintainer Author

matthew-carroll Nov 18, 2024
Maintainer Author

matthew-carroll Nov 19, 2024
Maintainer Author