Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a proper parser for link detection #583

Closed
Tyriar opened this issue Mar 3, 2017 · 21 comments
Closed

Use a proper parser for link detection #583

Tyriar opened this issue Mar 3, 2017 · 21 comments
Labels
area/links help wanted type/enhancement Features or improvements to existing features
Milestone

Comments

@Tyriar
Copy link
Member

Tyriar commented Mar 3, 2017

There will always be issues with URL matching using regex as some cases simply can't be caught with regex. An example where regex will fail is including brackets in URLs but only when the brackets are opened within the url:

  • Include brackets: http://<domain>.com/foo(bar)
  • Don't include wrapping brackets: (http://<domain>.com/foobar)
  • Include commas: http://<domain>.com/foo,bar
  • Don't include trailing commas: http://<domain>.com/foo, other text
  • Detect spaces (paths are ambiguous?) c:\Users\X\My Documents
@egmontkob
Copy link

FYI: These particular cases are handled correctly by gnome-terminal using regexes (pretty complex ones though).

See big rewrite bug, balanced parentheses bug, current implementation, unittests.

@Tyriar
Copy link
Member Author

Tyriar commented Mar 22, 2018

When someone tackles this issue there are some more cases on the VS Code issue tracking this microsoft/vscode#21125

@Tyriar
Copy link
Member Author

Tyriar commented Jun 2, 2018

Merged some issues into this one to support chrome-devtools scheme #599, no TLD #653 and long TLDs #601.

@Tyriar
Copy link
Member Author

Tyriar commented Jun 21, 2018

Merging UNC path support into this, eg. \\localhost\c$\GitDevelopment\standup\server\db.js

@JaneX8
Copy link

JaneX8 commented Sep 30, 2019

Will this be fixed anytime soon? It's very common for Windows users to have a username Firstname and Lastname. Meaning it will include spaces. Breaking Xterm.js which in term breaks the terminal that VSCode implements. A project folder in the user folder C:\Users\John Doe\Documents\Github\com.example.app would break the "clickable" path's in the terminal.

See: microsoft/vscode#21125

@Tyriar
Copy link
Member Author

Tyriar commented Sep 30, 2019

No ETA but it's open to PRs. I imagine the eventually implementation will look similar to this: https://github.com/microsoft/vscode/blob/master/src/vs/editor/common/modes/linkComputer.ts

@Tyriar
Copy link
Member Author

Tyriar commented Sep 30, 2019

Also needs a little thinking around how this affects the existing experimental "link matcher" api.

@Tyriar
Copy link
Member Author

Tyriar commented Oct 23, 2019

Current state

The way links work right now is that after the viewport has stopped scrolling for 200ms, the links are computed for the current viewport. Embedders can provide a validation callback which allows the embedder to validate the link sometime after the links are computed, note that links are only shown once they are validated. The end result is pretty nice, we can present underlines for all links even when you need a modifier to execute the link, it does however have some fundamental problems:

class Terminal {
	registerLinkMatcher(regex: RegExp, handler: (event: MouseEvent, uri: string) => void, options?: ILinkMatcherOptions): number;
	deregisterLinkMatcher(matcherId: number): void;
}
interface ILinkMatcherOptions {
	matchIndex?: number;
	validationCallback?: (uri: string, callback: (isValid: boolean) => void) => void;
	tooltipCallback?: (event: MouseEvent, uri: string, location: IViewportRange) => boolean | void;
	leaveCallback?: () => void;
	priority?: number;
	willLinkActivate?: (event: MouseEvent, uri: string) => boolean;
}

Proposal

The long standing hope was to move to a "parser-based" link system (#583) but it was never really clear how an addon would provide a parser exactly. Here's my very VS Code API-inspired proposal:

class Terminal {
	registerLinkProvider(linkProvider: ILinkProvider): IDisposable;
}

interface ILinkProvider {
	provideLink(position: IBufferCellPosition, callback: (link: ILink | undefined) => void): void;
}

interface ILink {
	range: IBufferRange;
	showTooltip(event: MouseEvent, link: string): void;
	hideTooltip(event: MouseEvent, link: string): void;
	handle(event: MouseEvent, link: string): void;
}

interface IBufferRange {
	start: IBufferCellPosition;
	end: IBufferCellPosition;
}

interface IBufferCellPosition {
	x: number;
	y: number;
}

The basic idea is that instead of evaluating links whenever scrolling has stopped, only evaluate links for the current cursor position when a hover occurs. While file access is slow en masse, single requests just for mouse movement should be reasonable. Some other things of interest:

  • The addon can still use the regex method and translate the results an IBufferRange or use a parser to check the whole line, or just expand outwards from the cursor.
  • It uses the buffer API to access the line, that's pretty cool 😎.
  • The link could be cached until a scroll happens for that entire range, meaning no need to recompute just for moving the mouse over the same cells.
  • It allows embedders to use their own link detection mechanism, VS Code has a shared implementation but we've been unable to use it in the terminal (Share link detection logic with the terminal and editor microsoft/vscode#83191).

Going this route would seemingly fix many of the problems VS Code has with links, namely:

  • No more perf issues validating on slow file systems.
  • We can linkify paths without separators since validation will only be triggered under the cursor.
  • By expanding left then right from the cursor we can fix many of the issues with the current link detection (Terminal link detection issues microsoft/vscode#21125), including spaces in Windows paths (this is still a tricky problem but seems more achievable).

The only downside for this is a slight delay in the link appearing as the computing and validation occurs at hover time.

Open questions

  • Do we need an ILinkMatcherOptions.priority equivalent?
  • This would be an ideal time to introduce Promise into the API, working with promises is lovely but we will always have callbacks in the parser for performance reasons. We could just stick to the callback everywhere for consistency?
  • We could use markers for the range instead of numbers, that would allow the link to be cached after a scroll occurs. It probably isn't worth caching like this imo but it does bring up another interesting question in the Replace ISelectionPosition API with IBufferRange #2480 discussion.

@jmbockhorst
Copy link
Contributor

Would this be an addon? Or part of the existing Linkifier?

@jerch
Copy link
Member

jerch commented Oct 24, 2019

@Tyriar Good idea to move the expensive link matching to the hover state, will def. remove some "lagging" from the viewport pause state. Few ideas/remarks from my side:

  • Imho its important to note that with a change like this its not possible anymore to prestyle links (underline or some special link color right from the start). Imho not a big issue, a terminal is not meant to work like a web browser in the first place.
  • priority - Not sure what you want it to help with? Nested links?
  • I love promises for their syntactic sweetness, still not sure if we should introduce them here while everything else is callback based (or sync API). Maybe stick to the least common and more performant denominator here - callbacks. Any JS programmer should be able to transform those into lovely promises. Beside the worse runtime promises also screw up stack traces, thus ppl might have reasons to avoid them.
  • Caching is indeed a problem here - w'o using markers its only possible for a "stable viewport" (any line buffer shift would invalidate the cache). Markers alone can not avoid reeval here - a cell might have changed thus a parser should not mark it as a link anymore. Markers would still be helpful in conjunction with a viewport update range event - this way a parser could limit the link eval to that range on hover and just return cache entries otherwise. Also note, that later added cell content that forms a correct link string with older content by accident should not be marked as a valid link, here the viewport statefulness makes it hard to correctly deal with it, which makes me think that we might need not only a line range update event, but a more fine-grained position-range update event.
  • I'd love to see this more as a general API approach to mark things in the viewport/buffer based on some condition (which could even be used by the search addon, partly answers @jmbockhorst question). Not sure if thats feasible, might be hard to harmonize between the different needs.

@Tyriar
Copy link
Member Author

Tyriar commented Oct 24, 2019

@jmbockhorst it would be a new Linkifier class (Linkifier2?) that would deprecate the existing registerLinkMatcher API and change the implementation of web-links. For VS Code we would not use web-links any more and instead leave all link detection up to the existing mechanisms.

I'd love to see this more as a general API approach to mark things in the viewport/buffer based on some condition (which could even be used by the search addon

@jerch that's basically where we're headed with decorations #1852, #1653

priority - Not sure what you want it to help with? Nested links?

Can't remember exactly why it was added, maybe to have web links take precedence over local or vice versa?

I love promises for their syntactic sweetness, still not sure if we should introduce them here while everything else is callback based (or sync API). Maybe stick to the least common and more performant denominator here - callbacks.

Yeah guess we should just commit to this, we can use Promises internally but callbacks in the API for consistency.

Imho its important to note that with a change like this its not possible anymore to prestyle links

Pre-styling isn't desirable imo as underline is a perfectly valid state.

@Tyriar
Copy link
Member Author

Tyriar commented Oct 25, 2019

Not sure if I'll have time to get to this in the next couple of months so this is open to PRs if someone want to have a go at implementing a proof of concept. I think we'll want to support both links types until the next major version at which point registerLinkMatcher will get removed.

@mofux
Copy link
Contributor

mofux commented Nov 1, 2019

IMO this would be an ideal use-case to refine our Marker API, so detected links can be tracked by a sticky range marker and can get auto-disposed (invalidated) in the following events:

  • A marked buffer range get's modified (e.g. you have a link in your prompt line and add / delete some characters)
  • A marked buffer range gets cleared out (e.g. clear command)
  • A marked buffer range falls out of the scrollback

Note that this marker should track the whole buffer, not just the viewport.

Once we have that marker that marks a buffer range (sticky), we can add another layer of abstraction that annotates that marked range with additional meaning. For example a link annotation, that references the marker and holds the parsed link.

As a last step, a renderer can iterate annotations in the viewport and draw them accordingly.

Here is a rough implementation of the linkifier that would use such an API:

// listen for buffer changes, we will get an IBufferRange in that indicates
// which part of the buffer was changed
terminal.buffer.onChange((range, buffer) => {
  
  // do the magic, detect links in that range
  const links = detectLinks(range, buffer);

  // create a marker for those links
  for (let link of links) {
    const linkMarker = buffer.addMarker(link.range);
    const linkAnnotation = buffer.addAnnotation({ type: 'link', data: link.url, marker: marker  });
    // dispose marker and annotation if you don't want them anymore
    // linkMarker.dispose();
    // linkAnnotation.dispose();
  }

});

I think this approach is ideal for performance. The link detection can be throttled, and only modified ranges in the buffer have to be rescanned.

It also cleanly decouples that semantic annotation of buffer ranges from their representation by the renderer.

@jerch
Copy link
Member

jerch commented Nov 1, 2019

@mofux Yeah very good idea. I want to extend this:

A marked buffer range get's modified (e.g. you have a link in your prompt line and add / delete some characters)

to all cells in active viewport* - if a marker partly covers cells of that, its content can still be modified by cursor jumps + writes, thus the marker should break. A bonus would be to re-eval those positions (with extends to left and right), as the new content might form a valid link again.

[*] Viewport is not the correct name for that - I mean those bufferlines, that are still reachable by the cursor. Guess we dont even have a name for that. Maybe "editable buffer"?

Edit: Note that a buffer.onChange will be hard to implement for cell level (both code and perf-wise) - we would have to track any cell change during chunk processing and summing them up into such an event. We kinda have that for lines already (dirtyRowService), but not on individual cell level. Maybe we can stick with line level here for the sake of performance? (Ofc this would raise the burden on any consumer of such an event).

@mofux
Copy link
Contributor

mofux commented Nov 1, 2019

@jerch Yeah I guess line level changes would be sufficient. We could still fire an IBufferRange for the affected lines though - so we can potentially support cell level changes more easily in the future without breaking API.

to all cells in active viewport* - if a marker partly covers cells of that, its content can still be modified by > cursor jumps + writes, thus the marker should break. A bonus would be to re-eval those positions (with > extends to left and right), as the new content might form a valid link again.

I'd think that a marker only breaks if a write happens within a line that is inside the marked range. The linkify implementation can over or underscan additional lines for link detection.

Maybe "editable buffer"?

Maybe "mutable buffer", or "non-scrollback buffer"? 😅

@jerch
Copy link
Member

jerch commented Nov 1, 2019

I'd think that a marker only breaks if a write happens within a line that is inside the marked range.

Yes, if onChange returns a range of line2 - line20 a marker ending on line1 should not be bothered, same for a marker covering line21+. Also this automatically turns any marker living in the scrollbuffer into stable ones (beside the disposing when finally dropping off).

Maybe "mutable buffer", or "non-scrollback buffer"?

Mutable suggests that the scrollback buffer is static/immutable which is not the case for resize. "Cursor-Addressible-Buffer" === CAB? Lol just kidding, all are kinda word monsters...

@Tyriar
Copy link
Member Author

Tyriar commented Nov 5, 2019

I think markers are a little heavy for this when you consider how links will be typically be used; the user types a bunch of stuff then wants to click on a link, that link will likely never be clicked again when it leaves the viewport. Caching the viewport would be good but anything outside of that I think is overkill. I would also invalidate when all lines of the link's range leave the viewport (not the CAB/NSB/MB 😉).

@jerch
Copy link
Member

jerch commented Nov 6, 2019

I think markers are a little heavy ...

Yeah your prolly right, markers still have the update burden for ANY buffer scroll, thus we should use them with caution. (Its a big difference for term.write if there are only 10 markers vs. 1000+.)

Regarding link detection and decoupling from render state - I think its perfectly fine to announce onHover the current mouse position, a link parser would only need to parse one wrapped line to find links (a link cannot span several real lines). This seems rather cheap to me, if a parser really wants to get more involved it could cache the findings based on line content (to make sure to correctly spot content changes). The cache here could help to debounce parsing from every single onHover report, still I'd give the cache entries rather short lifecycles as re-parsing seems so cheap. (Have not yet looked at #2530 😊)

@Tyriar
Copy link
Member Author

Tyriar commented Apr 30, 2020

Let's call this done with the link provider work which is about to ship in VS Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/links help wanted type/enhancement Features or improvements to existing features
Projects
None yet
Development

No branches or pull requests

7 participants