Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support non-text BufferLine #1901

Closed
PerBothner opened this issue Jan 18, 2019 · 26 comments
Closed

support non-text BufferLine #1901

PerBothner opened this issue Jan 18, 2019 · 26 comments
Labels
type/enhancement Features or improvements to existing features

Comments

@PerBothner
Copy link
Contributor

Suppose one wanted to support inline images (issue #614, pull request #1577) or a block of rich text (perhaps for help messages). Ideally one wants to position these are anywhere, interleaving them with character cells. However, I think you'd get 90% of the benefit by restricting these areas to cover the entire line width. This should be easier to implement, with fewer performance issues.

The contents of a Buffer is an array of BufferLine, which is (conceptually) an array of CellData. We can generalize that to an array of BufferSection, where a BufferSection can be a BufferLine - or other things:

class BufferSection {
    yOffset: number;
    ...
}
class BufferLine extends BufferSection { ... }

/** Arbitrary DOM element (HTML data) wrapped in a <div> */
class BufferDivSection extends BufferSection {
     element: HTMLDivElement;
}

class BufferCanvasSection extends BufferSection {
     element: HTMLCanvasElement;
}

Some previous prototypes have split an image into "bands" each the height of a line, using multiple BufferLines per image. I think this would be a mistake. One reason is it unnaturally forces each image/area to be an integer number of lines high. While it make be plausible to split an image into bands, this is no logical concept of bands for (say) BufferDivSection.

The implication is that BufferSection can have varying heights, which mildly complicates mapping from BufferSection to screen position and back (say for mouse clicks). But that is easy to solve: Just store the yOffset in each BufferSection. To map (say) a mouse click to a BufferSection, just use binary search.

Note we do not prohibit splitting an image into multiple line-high bands. That could be useful for sixel images or other images which are aligned to character cells. However, I don't think that ability is a priority.

A minor bonus: We can easily handle double-height characters using a regular BufferLine.

I can work on this if it makes sense.

@Tyriar
Copy link
Member

Tyriar commented Jan 18, 2019

@mofux had a nice idea (#791 (comment)) of aligning more with how Monaco does it which is similar to your proposal. Instead of dealing with the largely unrelated BufferLine you would inject a view at some marker (a line that tracks where it is in the buffer). This is closely related to the line decorator proposal here #1852

Maybe something like this (heavily inspired by Monaco's API)?

class Terminal {
  addViewZone(viewZone: IViewZone): IDisposable;
}

interface IViewZone {
  afterLine: number | IMarker; // if number is used it's converted into a marker
  height: number;
  domNode: HTMLElement;
}

It might also be easier (but not required) if smooth scrolling is implemented as lines would need to be rendered not aligning with the grid for the first time #1140. Forcing any view to be a multiple of the character height might also be a way to simplify this.

@Tyriar Tyriar added the type/enhancement Features or improvements to existing features label Jan 18, 2019
@PerBothner
Copy link
Contributor Author

@Tyriar: This seems to be somewhat different from @mofux's proposal, which was to associate a decorator with a span of characters, while your seems to be for IViewZone objects that span the entire width. My reading is that you're suggesting using the decorator idea, but limiting ourselves to full-width zones - is that it?

I haven't found any useful documentation or example for using markers. Nor have I found any place in the code that does anything interesting with them, such as updating them when the buffer is modified. Are they tied to lines or y-offsets? What happens if you delete a line above the marker - does the marker get adjusted, and if so, by who? (I could easily have overlooked something with very rudimentary greps.)

"Forcing any view to be a multiple of the character height might also be a way to simplify this."

I think that makes senses for an initial prototype, but I think it is desirable to support areas that are not an integral multiple of the character height. One reason is to support short decorations, like a <hr> or other divider. Another is that it seems cleaner for zones that may re-size when the window resizes. A way to deal with that would follow-up feature to allow different lines to have different heights. (I.e. zones would be an integral number of lines high - but after the initial prototype we could add a mechanism to reduce or increase the height of specific lines.)

@Tyriar
Copy link
Member

Tyriar commented Jan 18, 2019

This seems to be somewhat different from @mofux's proposal

@PerBothner see the bottom of that comment:

For things like rendering pictures inside the terminal monaco uses a concept of view zones that can be seen (among other concepts) in an example here:
https://microsoft.github.io/monaco-editor/playground.html#interacting-with-the-editor-listening-to-mouse-events

Are they tied to lines or y-offsets? What happens if you delete a line above the marker - does the marker get adjusted, and if so, by who?

Markers listen for trim events, when lines and removed/added to the buffer they are also adjusted (the reflow PR adds this).

but I think it is desirable to support areas that are not an integral multiple of the character height.

I agree, but the thing is it might look awkward unless smooth scrolling is supported as the height of the terminal must be a multiple of row height.

@jerch
Copy link
Member

jerch commented Jan 22, 2019

I kinda like the idea to be able to insert arbitrary foreign content at line level. To me it boils down to the question whether we are bound to perfect grid alignment and line counter (thus have to make "room" in the grid) or can calculate the scroll offset by other means. From the buffer perspective Per's idea certainly simplifies things (given we keep it on line level), but I think the renderer/offset calc will struggle?

I have not yet looked deeper into the decorator pattern of monaco (sorry my bad), isnt that meant for overlays based on the text grid (thus solving a different problem)?

@Tyriar
Copy link
Member

Tyriar commented Jan 22, 2019

I have not yet looked deeper into the decorator pattern of monaco (sorry my bad), isnt that meant for overlays based on the text grid (thus solving a different problem)?

@jerch that's for decoration, they also have view zones which are inserted between rows https://github.com/Microsoft/vscode/blob/f20fb4e7c907a5835f4bc9c5680614220efabf7c/src/vs/monaco.d.ts#L3459

@PerBothner
Copy link
Contributor Author

Functionally, it doesn't seem that different:

"A view zone is a full horizontal rectangle that 'pushes' text down. The editor reserves space for view zones when rendering."

That implies text "following" the view zone is below it, which doesn't seem to be that different from BufferDivSection. There is difference in how the Buffer data structure handles it, of course.

I assume a IViewZone can have dynamic height, depending on the Element used? If you detect that the Element changes height (perhaps because the width changed forcing a re-flow), can you assign to heightInPx, and would doing so adjust the y-position of following lines and zones?

@Tyriar
Copy link
Member

Tyriar commented Jan 23, 2019

@PerBothner yes they're basically the same just that my proposal means that things won't be inserted into the buffer but rather attached to bufferlines using markers.

I assume a IViewZone can have dynamic height, depending on the Element used? If you detect that the Element changes height (perhaps because the width changed forcing a re-flow), can you assign to heightInPx, and would doing so adjust the y-position of following lines and zones?

I think height should probably be static based on the size you assign it in the API, for heightInPx unless we are using smooth scroll it should round up to the nearest line such that each line is aligned with the grid..

@jerch
Copy link
Member

jerch commented Jan 24, 2019

Slightly offtopic but might relevant for the line level idea:
Reading the thin SIXEL spec more closely it seems we cannot get away for SIXEL with sticking to lines for foreign stuff only. In the graphics scrolling mode it clearly says that the SIXEL drawing starts at the current cursor position. Not sure yet how the SIXEL bands behave here (will they align to the beginning of the next line once the graphics cursor gets a CRLF?), a pity I have no device to test this out. Does anyone have access to an old VT340 upwards?

@PerBothner
Copy link
Contributor Author

I don't think it makes sense for sixel drawing to start anywhere except the beginning of a line. One reason is the specification doesn't (as far as I can see) define a mapping (a ratio) between pixels and character cells except indirectly and tied to specific hardware. There seems to be no semi-portable or useful way to start in the middle of a line.

From what I can tell from a preliminary study of how libsixel handles updates, and it appear to just redraw the entire image - which seems consistent with the lack of an addressable sixel cursor. To reposition the cursor it appears you have to restore the previously-saved text cursor before entering sixel mode.

It's pretty crude and inefficient. It's ok for printing plain static images, but is bad for anything dynamic (including animated gits).

(Again, this is from my preliminary investigations.)

@jerch
Copy link
Member

jerch commented Jan 25, 2019

Just tested SIXEL in xterm and mlterm with mixed results:

mlterm does not move the image at all, it always starts at 0, 0 of the viewport, no matter if sixel scrolling is enabled/disabled. This seems broken to me.

xterm does a weird mixture - if the sixel image fits in the remaining space of the text line (+ not higher as one text line) it gets placed at the current text cursor. If its higher or wider it reflows to the next line to the left.

Weird. Needs more investigation.

@PerBothner Yeah SIXEL only has a forward moving graphics cursor (Edit: for multiple lines, within the line it can jump back with CR), to overdraw something you would have to leave SIXEL mode, replace the text cursor and start over with SIXEL again. On the old devices the aspect ratio setting was meant to express the relation to the text metrics. But I have no idea how this can be applied in a working way. My early sixel lib just ignores that (https://github.com/jerch/node-sixel).

@jerch
Copy link
Member

jerch commented Jan 25, 2019

Ok newer versions of xterm and mlterm show the same behavior: the graphic always gets inserted at the current text cursor pos, no reflowing/wrapping at all. Overflow at the right border is simply cutted.

This certainly simplifies things, as it would just need a line plus cell offset marker and draw the image there. Still need to find out how the text cursor is commonly handled (+ overflow edge cases). And last but not least - how resizing with the new reflowing can work together.

@PerBothner
Copy link
Contributor Author

The sixel format has some limited usefulness because there are some programs that can "print" sixel output. But it is a pretty horrible and limited format, and is little used: Even xterm doesn't support it in its default configuration. So while supporting sixel is nice, it is more useful to support gif/jpeg/png (perhaps using the iterm2 1337 escape sequence). Sixel support might make more sense as an addon.

Images (and general embedded content) that don't start in the left column have limited usefulness, and that applies double for sixel images (because of the weird 6-pixel-band nature of sixel). However, if it can be supported cleanly, might as well - for improved xterm compatibility, if nothing else. The specification says that "When sixel mode is exited [and sixel scrolling is enabled], the text cursor is set to the current sixel cursor position." If the sixel position is not at the left column, one could approximate this behavior by writing enough empty characters cells for the width (rounded up) of the sixel image. I doubt that would be very useful - forcing the cursor to the left margin below the image is probably simpler and just as useful.

@PerBothner
Copy link
Contributor Author

To parse a sixel image, I think it makes sense to write a function parseSixelImage that takes the DCS text as a string and returns an ImageData object. That ImageData can then be used for the contents of a fresh <canvas> object, or it could be drawn into the master canvas - I don't know what makes most sense.

My plan would be to implement this function using one (temporary) Uint8ClampedArray for each line of pixels. The initial size of the Uint8ClampedArray would be that specified by a sixel Raster Attributes command, if specified, and some reasonable width (window width?) otherwise. If there are more sixel elements than allocated in the Uint8ClampedArray, then the latter is copied into a fresh 50% bigger array. We track the maximum width of all the lines we've seen so far.

Each new sixel band would allocate 6 new Uint8ClampedArray objects. At the end of the sixel data, we know the image width and height. We allocate an ImageData of size 4*width*height, and copy all the single-line Uint8ClampedArray pixel lines into the ImageData's data array.

I plan to start by implementing the parseSixelImage function (and test it in DomTerm). I can figure out
how to hook it up to the EscapeSequenceParser. (It might make sense to change the Dcs hooks to be more like we recently did for CSI and OSC sequences, but I haven't looked at it.) I will need some help as to the best way to actually make use of and render the ImageData.

@jerch
Copy link
Member

jerch commented Jan 25, 2019

The sixel format has some limited usefulness because there are some programs that can "print" sixel output. But it is a pretty horrible and limited format, and is little used: Even xterm doesn't support it in its default configuration. So while supporting sixel is nice, it is more useful to support gif/jpeg/png (perhaps using the iterm2 1337 escape sequence). Sixel support might make more sense as an addon.

Lets keep the discussion on the marker/decorator/BufferSection things. I just brought up sixel since its a living "foreign content" thing, which supports embedding the content with a cell offset. Yes it can stand for images in general, but the discussion whether to support format xy is better suited for this thread #614. (I tend to disagree with the 1337 sequence as it is non standard, if we want general image support we should go with discussing it on a broader basis).

With my recent findings regarding a possible left cell offset for sixels I think the decorator idea with "making room" in the grid is the better one. I wrote a tiny test script for sixels to see the text cursor behavior in xterm and mlterm (https://github.com/jerch/node-sixel/blob/master/sixel_textcursor.sh). Since they behave very different in certain aspects I'd call the sixel thing alpha for both emulators. Seems we cannot gain much "correct" things from testing on those.

@jerch
Copy link
Member

jerch commented Jan 25, 2019

I plan to start by implementing the parseSixelImage function

https://github.com/jerch/node-sixel. Lol.

@PerBothner
Copy link
Contributor Author

PerBothner commented Jan 25, 2019

Looks like you're way ahead of me in terms of implementing sixel. Let me know if I can help.

@jerch
Copy link
Member

jerch commented Jan 25, 2019

Yeah basically did that in the last days with almost the same approach as you intended to do above. Well feel free to test and mock about it, it certainly needs code review and tests.

I think its a good starter to see what other terminals do with those sixel stuff, but I also think its not a living standard yet (or not anymore) as it seems handled quite different/partly broken. I think we should try to grasp and see if it suits a more general embedding concept of foreign content.

@jerch
Copy link
Member

jerch commented Jan 31, 2019

I'd like to do a small summary of the ideas we have so far and want to shed some light on the pros and cons in regards how terminals currently work, also inspired by my sixel adventures.
(The following is highly opinionated, feel free to agree/disagree with any of it.)

Lets start with how terminals currently work. This is nothing new for any of us, still its worth mentioning for easier evaluation of the pros and cons of the ideas later on.

  • Terminals are grid based, ideally with perfect row/col alignment, means single cells share the same metrics. A terminal viewport can be seen as a row X cols matrix/canvas.
  • Primary purpose of a cell is to hold a halfwidth character, maybe enriched by further text attributes.
  • A terminal is cursor driven, thus has to maintain a global cursor state. Data to be printed are added at the current cursor position. The cursor typically advances with new data 1) from left --> right and at the end of a row from 2) top --> down to the beginning of the next line. It basically resembles the writing progression of the latin writing system (start top-left, advance to right, wrap to left in next line).
    This default cursor behavior can be changed in various ways, most notably it can be freely placed within the "active viewport" turning the terminal into a random access text canvas.
  • Nowadays most terminals offer scrollback for content, that went outside at the top and is not reachable by the cursor anymore. Furthermore most terminals maintain two buffers, which share all functionality beside the scrollback (the altbuffer typically has none).
  • There are tons of additional control sequences that can further "cheat" on these basic concepts, e.g. insert/remove cells/rows, tabstops, define scroll areas, report functionality. The most advanced "cheating" allows to split a terminal viewport into several "subterminal views" without breaking the basic concepts - terminal multiplexer.

Thats where we start from. Now to the ideas to embed some foreign content (foreign in the sense of not strictly following the grid):

  • buffer sections / view zones / "grid-holes"
    This idea places foreign content at line level either between the grid or enlarges parts of the grid to hold that content like this:

    image
    As soon as the foreign content is seen for any of the blue cursor positions, the grid gets interrupted and the content gets inserted. The alignment of the content is undetermined (might be left/middle/right or any other supported). The text cursor jumps below that foreign object into the next real grid line (prolly at position 0, green cell). The skipped grid height for the foreign object might be a multiple of a single grid line height or free (depends on emulator/renderer needs).

    • Pros:
      • very easy to implement
      • covers a great amount of foreign content use cases, like graphical REPLs to embed some plots etc.
    • Cons:
      • breaks the grid, the big white area is not addressable anymore by the text cursor, thus no text can be printed next to the foreign content (basically breaks any multiplexer)
      • screws with viewport rows, cursor repositioning would have to jump over the area, text canvas usage is hardly doable anymore
  • overlay / "underlay" / "make room in the grid"
    This idea tries to keep the grid logic intact by making room in the grid, thus advancing the text cursor (blue) right behind the foreign content size (green):

    image

    • Pros:
      • keeps grid intact, no unaddressable cells (beside the foreign content? see sixel comment below)
      • no messing with viewport rows
      • multiplexing still possible
      • left offset can be determined by text cursor position
    • Cons:
      • harder to implement
      • foreign object size needs to be mapped to certain cells + calculate text cursor advance
      • still many uncovered details, esp. for text canvas usage

Imho the second idea is clearly favorable as it does not break with to many typical terminal paradigms. Still it leaves many things uncovered, to name a few - What about inserting a line that crosses the foreign content? Inserting/deleting single cells at the foreign content area?

I have not yet a general working solution for these questions, but maybe we can learn/adopt something from the current SIXEL support in xterm/mlterm:
When the image arrives as input data, both emulators overdraw any grid cells in question to hold the image (emptying the cells). The image simply gets layered at these positions. The text cursor advances right behind the image or wraps to the next line to the left if the image touches the rightmost grid cell. If the image overflows right side it gets cut.
The tricky part is how to deal with cursor repositioning + text input into the image area. Here both seem to move the image into the background ("underlay") and simply allow overdrawing with new text data. Note that the BG text attribute is also applied, thus for a default of white on black a white char is drawn with a black cell box.
I have not yet tested how they deal with inserted lines/cells, whether they move the image content accordingly (strict grid adherence) or simply blackout things.

Furthermore this quite strict grid handling like done for SIXELs in xterm/mlterm does not play well with other foreign content, like HTML boxes, PDFs (whatever an emulator might want to support as foreign content). Should it be allowed to overlay the grid? Idk, maybe yes, to keep the text canvas usage intact, an overlay could vanish as soon as the cursor jumps into that area and tries to place text there. Another grid layer on top seems unfeasible to me.

@PerBothner
Copy link
Contributor Author

The last few days I've been implementing Sixel support in DomTerm, using @jerch's node-sixel library - see jerch/node-sixel#6. DomTerm uses a different model from either of the two that @jerch mentions, and I think it is work considering.

The idea is to treat "embedded objects" as extra-large characters, that conceptually take a single grid cell. An embedded object can be an image, or any other area that is opaque to the character-grid model. In JavaScript-based terminals, an embedded object can be a general nested <div> or other block.

If the embedded object is a single cell, it is part of the grid, so it is easy for address the cursor before or after it. It is easy to delete or replace the object - for example erase-rest-of-line willerase any objects to the right of the cursor, regardless of how wide they are..

This implies variable-height lines, but that is no real conceptual problem, given scrolling: If there is room for 30 normal grid lines, then the grid will have 30 lines, even if some of them are much taller than character-only lines. The "home" position may be scrolled off, but that is no problem.

A variation of this idea is for such objects to be a single row high, but to use as many cells as will cover the width of the object. The issue is what happens to text written after the object. Actually two issues: when post-object text reaches the right margin, will it be wrapped or truncated? And how will cursor motion post-object work? One option is to leave it undefined if you try to write text after an object.

My suggestion: Wrapping of lines with embedded objects is similar to normal: If an embedded object will extend past the right margin and the current position is not at the left margin, move the cursor to the start of the next line. (If an object is wider than the line width, it gets truncatee.) So far this is what DomTerm does. But I suggest another idea (not implemented in DomTerm): If a line containing embedded objects is wrapped, it still counts as a single line in terms of the conceptual grid.

More precisely: The state is a set of logical lines, as you would have if you widened the window so there is no wrapping (and after re-flow). When wrapping, each logical line is one or more visual lines. A compound line is a logical line that contains embedded objects. An addressable line (the unit used for cursor motion) is either a logical line (if it is a compound line) or a visual line (otherwise). The number of addressable lines is the number of non-compound lines that will fit in the window.

This model is slightly complicated, but it has the advantage that an applications can uses cursor motion and updating with predictable behavior without knowing the precise size of embedded objects.

@PerBothner
Copy link
Contributor Author

As a stepping-stone to my more general model, I suggest implementing the first option in @jerch's comment, but make the foreign content (image or whatever) be part of the logical character grid: Conceptually, the foreign content is a single character, on a single line. If the cursor is after the image, then CSI D will move the cursor to just before the image. Writing a space will then erase the image, replacing it by a single space. And so on. This would support the "animated gits" as translated by libsixel. (These are implemented by saveCursor, draw image, pause, restoreCursor, repeat.)

The (initial) restriction is that you can only "print" a foreign object when the cursor is at the start of a line, and it is undefined if you print anything else on that line. I.e. this is similar to the "non-text BufferLine" idea of my initial message, but with the idea that a later enhancement may support more general "compound lines", with a mix of text and maybe multiple foreign objects on the same line.

@mofux
Copy link
Contributor

mofux commented Feb 1, 2019

IMO the challenge of rendering images inside the terminal is very similar to supporting folding ranges or links, and even to the way colors, underline, italic and so on are currently rendered. We use escape sequences to annotate one cell or a range of cells with metadata. The renderer picks up these annotations and renders them accordingly. I think it is important to note that these annotations can only work on existing cells, but cannot magically add new cells to the buffer (like they would have to in @jerch example #2).

So how do we render images?
Imagine the following content being printed to your terminal, with text in brackets containing image annotations:

This is my [cat] and my [dog] last summer.
prompt>

There are several ways to render this. For example, the renderer could underline [cat] and [dog], and while hovering with the mouse over one of these words, a thumbnail picture follows the cursor.
screenshot 2019-02-01 at 11 55 01

When clicking on the word, a closable view zone below that line opens, showing the picture in bigger detail.
screenshot 2019-02-01 at 11 55 15

To address the original topic: I don't think we need to support non-text BufferLine at all, we only need a way to store and retrieve metadata for a cell or cell range - which is basically the idea behind decorations in #791 (comment)

@PerBothner
Copy link
Contributor Author

The "image-on-hover" idea could be nice in some applications, but I think that should be handled as part of a hover facility - see a discussion and proposal here. Regardless, there is demand for an escape sequence to actually display an image or other "foreign content". Making the image be a (single large) "pseudo-character" makes sense (as proposed in my recent comments) as it works well in terms of the cursor model and being able to navigate and replace/delete images.

My latest thinking is indeed we don't need a special non-text BufferLine class. Instead certain cells can be foreign objects, indicated with a special flag and an index (in place of the character code). The index is into an array of new field in the BufferLine. If this array is non-null then the height and the rendering of the line is non-standard.

@mofux
Copy link
Contributor

mofux commented Feb 1, 2019

I wouldn't advise using a "pseudo-character" for this, instead the escape sequence should annotate text, e.g. [cat] with that foreign content. This makes sure that terminals that can't render the foreign content still render the text. This is also good for screen readers. Of course, if a line contains such foreign content it's up to the terminal implementation itself to render it (for example, it could just render the foreign content in a view zone below the line that contained the foreign content annotation - similar to the 2nd picture).

@PerBothner
Copy link
Contributor Author

I meant "pseudo-character" in the sense of treating an image as a large character taking a single space in the character grid, not in terms of whatever escape sequences are used. That is an orthogonal issue. I agree it would be nice to design the escape sequence used so it falls back to text if the escape sequence is not understood. I hadn't really considered that. (One more problem with Sixel is there is no good way to specify fall-back text.)

You can think of my proposal as a kind of decoration, but one attached to a single character (rather than a span), and that has a "side effect" of changing the rendered size (but not the size in terms of cursor addressing). It seems useful to potentially support (at least) two kinds of decoration: ones attached to a span (but don't change cell sizes), and ones attached to a single cell (but may change cell size).

@jerch
Copy link
Member

jerch commented Feb 1, 2019

I have to step back from this discussion for abit, have still enough other tasks on the desk regarding xterm.js. Furthermore I think we should get the basics sorted before we implement anything like a more advanced marker/decoration thingy (whatever it will be called).
Meanwhile keep the ideas flowing, the final thingy can only benefit from a well thought out concept.

@Tyriar
Copy link
Member

Tyriar commented Oct 7, 2019

No plans on adding something like this currently, the discussion with the images may end up turning into something like that but for now we can track in #614

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Features or improvements to existing features
Projects
None yet
Development

No branches or pull requests

4 participants