pospcase
is a Emacs Lisp package for extracting S-expression
positions using pcase
pattern matching capability.
The package also contains a pretty advanced code highlighter for
lisp-mode
and emacs-lisp-mode
, fully extensible by using pcase
patterns.
Emacs’s font-locking (highlighting) heavily relies on regular expression (regexp).
There’s already an attempt to do regexp based extra highlighting for Lisp and Emacs Lisp.
When you read the code from the link, let’s say
lisp-extra-font-lock-match-let
, a anchored-matcher for the keyword
let
. (BTW I use the word submatcher in this package, but I really
means anchored-matcher). you’ll notice parsing S-expressions with
regexp is inherently awkward and error-prone (you also notice
in-between comments is not supported).
To reliably parse S-expressions with regexp, you basically have to
call syntax-ppss
and forward-comment
everywhere in pattern matching
part of your code. The resulting code is quite unreadable hybrid of
regexp matchings, constant syntax-table checkings, and numerous
corner-case coverages.
Here’s an example code for regexp based S-expression parsing:
;;; match and collect all symbol positions within the same nested level
(let ((current-level (car (syntax-ppss)))
(symbol-pat "\\_<\\(?:\\sw\\|\\s_\\)+\\_>")
(skip-pat "\\(?:\\sw\\|\\s_\\|\\s \\)"))
(if (zerop current-level)
(error "The point is in the top level.")
(backward-up-list)
;; skip the starting non-symbol chars
(skip-chars-forward skip-pat)
(cl-loop
with result
;; return if the task ended or the `point' is out of scope
when (or (eobp) (< (car (syntax-ppss)) current-level)) return result
;; actual matching
do (progn
(forward-comment most-positive-fixnum)
(looking-at symbol-pat))
;; collect matched symbol's range in a cons pair
if (= (car (syntax-ppss)) current-level)
do (setq result (cons (cons (match-beginning 0) (match-end 0)) result))
;; skip uninteresting characters
do (progn
(goto-char (match-end 0))
(skip-chars-forward skip-pat)))))
Next a code with nearly identical result but using pospcase-read
,
the reader function from this package:
;;; collect all symbol positions within the same nested level
(if (zerop (car (syntax-ppss)))
(error "The point is in the top level.")
(backward-up-list)
(cl-loop for pair in (car (pospcase-read (point))) ; `car' because `cdr' contains
; position data of entire list
if (symbolp (car pair)) collect (cdr pair)))
Of course I prefer the later code much more.
In addition, Emacs recently got a nice pattern matching macro
pcase
. Nowadays You can easily match and destructure any
S-expression whenever you want.
So I decided it’s a good time to write a pcase
-based position
extractor and use it for a Lisp code highlighter.
To activate pospcase-font-lock
for lisp-mode and emacs-lisp-mode,
add the following code to init file:
(add-hook 'after-init-hook
(lambda ()
(add-to-list 'load-path
"/path/to/pospcase") ; change it appropriately
(require 'pospcase-font-lock)
(pospcase-font-lock-lisp-setup)))
The above code add pospcase-font-lock-lisp-keywords~add
to the said
two lisp major-modes’ hooks. The highlighting will be activated for
every future Emacs session. But if you want to activate it right now
type: M-x load-library RET pospcase-font-lock
then M-x
pospcase-font-lock-lisp-setup
, lastly reactivate the major-mode of
each buffer you want to be highlighted, for example M-x
emacs-lisp-mode
if the current buffer is Emacs Lisp code.
Side note:
Currently pospcase-font-lock
doesn’t have minor-mode unlike
lisp-extra-font-lock
. The reason is I designed the font-lock
specific functions so I can easily making buffer-local
keywords. Then I can simultaneously work on different codebases in a
single Emacs session with different highlighting rules with the same
keywords for each codebases.
It may never happened to you, but I’ve actually seen two codebases
define defun*
by themselves with completely different meanings with
different argument styles (one of them enforce type check with ->
(dash-greater-than) as the infix).
If you want to enforce the same highlighting rule to a single major-mode, minor-mode is the best way to do it. But your workflow involves numerous per-project highlighting rules, I feel there’s not much point using minor-mode for it.
But, by all means I don’t oppose writing your own minor-mode if you
find it useful. You can write it by simply using
pospcase-font-lock-lisp-keywords-add
and
pospcase-font-lock-lisp-keywords-remove
to for toggling them.
pospcase
is written because I wanted easier way to add new
highlighting rules to fully embrace Lisp’s ultimate flexibility. The
function pospcase-font-lock
is specifically written for that
purpose.
Suppose I want to add a highlight rule for a new Common Lisp macro
mydefun
, I can simply write like this:
(pospcase-font-lock
'lisp-mode ; major-mode name
'(`(mydefun ,name ,args . ,_)) ; `pcase' pattern to match
;; font specs
'((heading-keyword .
(font-lock-keyword-face)) ; face of `mydefun' keyword
(name
. (font-lock-function-name-face)) ; face of new function `name’
((args . list/1) ; `args' is arbitrary length
; list of symbols and lists.
. (font-lock-variable-name-face)))) ; face of every argument
Hopefully it’s straightforward enough for you. The most foreign part
is list/1
. To understand what it is, You have to understand
Submatchers. But I’ll explain with more details later.
The symbol heading-keyword
appeared out of nowhere. It’s because
internally this pattern is translated to:
`((and 'mydefun heading-keyword) ,name ,args . ,_)
and heading-keyword
is automatically assigned. The pattern is so
recurring (and arguably less readable), I took the liberty and made
it to a shortcut notation.
We are going to use $HOME/.emacs.d/pospcase-user.el
to store custom
highlighting rules. (If you don’t like it, change the following
code appropriately)
In pospcase-user.el, insert the following code:
-*- lexical-binding: t -*-
(require 'pospcase)
(let ((user-file load-file-name))
(defun my-add-new-font-lock-keyword ()
(interactive)
(let* ((str
(format "
(pospcase-font-lock
'%s
'(`(foo ,bar ,baz . ,_))
'((heading-keyword . (font-lock-keyword-face))
(bar . (font-lock-function-name-face))
((baz . list/1) . (font-lock-variable-name-face))))"
major-mode)))
(find-file user-file)
(goto-char (point-max))
(insert str "\n")
(backward-char (- (1+ (length str)) (string-match "foo" str))))))
From now on, whenever you encounter a new keyword which needs extra
highlighting for maximum readability, you can just M-x
my-add-new-font-lock-keyword
and start writing a new keyword right
away from the convenient cookie cutter (you can also write a new
snippet for Yasnippet if it’s your style).
If you are satisfied with the new keyword, save the buffer, C-M-x
or C-x C-e
or whatever command you use to evaluate the code, then
go back to your project and reactivate the major mode, for example
M-x lisp-mode
for a Common Lisp buffer. Now you see the new
font-lock rule is applied and the code is highlighted accordingly.
Lisp’s flexibility sometimes causes unfortunate accidents that two people to choose the exact same keyword for complete different purpose in their own codebases. Two different definitions means two different highlighting rules. You need buffer-local keyword rules for them.
For example, ASDF package system for Common Lisp defines defun*
and use it internally. To highlight the keyword you wrap your
pospcase-font-lock
statement like this:
(add-hook
'lisp-mode-hook
(lambda ()
(when (and (buffer-file-name)
(equal (file-name-nondirectory (buffer-file-name)) "asdf.lisp"))
(pospcase-font-lock 'lisp-mode
'(`(defun* ,name ,args . ,_)
`(defgeneric* ,name ,args . ,_))
'((heading-keyword . (font-lock-keyword-face))
(name .
(font-lock-function-name-face))
((args . list/1)
.
((pospcase-font-lock-variable-face-form
(match-string 3)))))
t)))) ; buffer-local-p
Writing a predicate for detecting which codebase the file belongs is sometimes tricky and I can’t provide universal solution for the problem. So be creative and invent your own method for codebase detection if your use case is more complex than the above one.
All I’ve done in the previous section can be achieved without
touching Emacs Lisp code, by using the command pospcase-addform
. It
doesn’t completely free you from code tweaking but at least it
gives you fancy forms to fill.
To use the command, run M-x load-library RET pospcase-addform
or
evaluate the following code:
(require 'pospcase-addform)
Then M-x pospcase-addform
.
If you read the previous section surely you can guess what each entity of the forms represents. Add/delete/edit them accordingly to make your own highlighting rule.
Once done, click Accept
button. It automatically generate
identical code to what you wrote in the previous section in the
user config file. If you see no problem with the code, evaluate it
then reactivate the major-mode to apply the new highlighting rule.
I like writing Lisp code. Manually writing font-lock rule in my
config file has never bothered me. But sometimes tabbing through
and tinkering the form entries feels good. So here it is, if you
like this style of editing, pospcase-addform
is here for you.
A minor inconvenience is you can’t comment your code within the forms. You have to add it manually to the user config file later.
Unfortunately current pospcase-font-lock
design doesn’t simply allow
you to write a pcase
patterns then Emacs instantly highlight the
matching code section for you. There’s something you need to know
before writing your own font-lock keywords.
This is largely due to my design decision to keep the implementation as straightforward as possible even at the expense of introducing a new concept and shoving it at the user’s face.
So please bear with me and read the following subsections.
The macro pcase
, which pospcase
is heavily depending on, is not
particularly designed for pattern-matching arbitrary length
S-expression. But we exactly want that feature in our use
case. Obvious example is argument list for function declaration. To
overcome the limitation, you have to choose appropriate submatchers
for each arbitrary length list. So far, the following submatchers
are implemented.
list/2
list/1
flet
destructuring
macrolet
defstruct
(list/1
)parameter
loop
(destructuring
variant)loop-and
setq
If you are a skilled programmer, maybe you can just skim the actual
keyword declarations in pospcase-font-lock-lisp-setup
and build
your own keyword without any prior information. But I’m going to
explain each of them bellow.
If you pair a pcase
pattern variable with list/2
in the specs of
pospcase-font-lock
like this:
(args . list/2)
It means args
is a arbitrary length list of either symbols or
strictly two length list. Like argument list of defmethod
:
(defmethod foo
((bar class1) (baz class2) qux quux) ; <- this list
body)
Yes, the strange /2
(slash two) at the end of name is added to
indicate list/2
matches two length list (notation is stolen from
function arity notation).
This submatcher is almost the same as list/2
(also implementation
is almost identical too). But unlike list/2
it matches the first
element of arbitrary length list. Like:
(foo (bar) (baz qux) (quux meow woof) (oink quak quaak quaaak))
What happens when you want to match to exact one length list?
Well, I haven yet encounter the said use case, so I
narrow-mindedly named it list/1
for ease of typing and
readability. Should I rename it like list/1*
(asterisk for
arbitrary length)? Let me know how you feel.
This submatcher, as the name indicates, is designed for matching
function list of flet
.
The car
and cadr
of of each list. The cadr
part matches arbitrary
length list. When a element of the arbitrary list is also a list,
it matches only the first like list/1
. Like:
((foo (bar) body)
(baz (qux quux) body)
(meow (woof (oink quak)) body))
This submatcher matches every symbol of a list no matter how deeply nest they are. Like:
(foo (bar (baz) ((qux))) (((quux))))
Of course it’s used for matching destructuring-bind
and defmacro
,
etc.
Similar to flet
. But the cadr
part is the same as destructuring
.
I was surprized when I realized I have to implement a submatcher
specifically for defstruct
. The uniqueness comes from the layout
defstruct
keyword and a docstring is placed before the slots of
the defined structure. As you know it, docstring is optional and
submatcher manually have to check whether current pattern has
docstring or not. Then set a fence to ignore the unnecessary
heading S-expressions from the matches.
Hopefully you don’t need to touch this submatcher for your highlighting needs and have no occasion to deal with more strange syntax in the wild.
The most tricky syntax is the parameter keywords for argument list
(or lambda list). They change following semantic meaning and
therefore highlighting rules when they appear in the middle of a
list. The most notorious example is destructuring-bind
like:
(let ((meow 1) (woof 2))
(destructuring-bind (foo (bar
&key (baz meow) (quux woof)))
'(1 (2 :quux 4 :baz 3))
(list foo bar baz quux)))
It matches in-middle keyword appearance, then overwrite the original highlighting rule. So it’s very sensitive to the declaration order of font-lock keywords.
I really hope you don’t have to touch this submatcher at all.
(Secret note: this pattern is not even pcase
pattern. So
declaration is irregular or non-existent too. They are just a list
of keywords. Don’t try to see them as patterns in case you really
need to use this submatcher.)
This submatcher is for highlighting variables inside the notorious
loop
macro of Common Lisp. As it may sound crazy, the loop
macro
does destructuring by default when assigning a local variable. So,
more than simple regexp based highlighting is needed.
I’m sure you don’t have to touch this submatcher in your
lifetime. (Unless you are implementing loop
macro version 2.0 or
something).
A variant of loop
submatcher made just to support the loop
macro’s
and
keyword.
This submatcher is for highlighting variable names (plain symbol
names only) assigned by setq
and setf
. As you know it, setq
and
setf
allow indefinite length of variable-value pair for
assignments within a single evaluation like:
(setf foo 1
bar 2
baz 3)
Technically speaking, every even symbols in the list is going to be matched.
I suppose this submatcher is going to be used only by setq
and
setf
.
Dirty secret of pospcase-font-lock
is it uses regexp search for
heading keyword. For example from the pattern:
`(defun ,name ,args . ,_)
The keyword builder function pospcase-font-lock-build
extracts the
heading keyword defun
and constructs a regexp pattern string:
"\\(?:(defun\\)\\_>\\s *"
Technically it’s possible to search and jump within a buffer using
pcase
patterns. But I fear it’s going to be very costly.
Currently I have no plan to switch from regexp based heading keyword
search. But it also means you have to write pcase
pattern to regexp
pattern string translator by yourself if you want to use some
specific pcase
pattern for heading keyword.
Please register your pattern using M-x customize-variable RET
pospcase-stringify-heading-keyword-cases
.
Sorry for the inconvenience.
Submatchers call pospcase-at
and pospcase-read
to parse
S-expression and get positional metadata.
pospcase-at
returns cons cells in (start . end)
.
pospcase-read
returns S-expression tree with each node with cons
cell in (sexp . (start . end))
Submatchers either manually collect (start . end)
pairs of
interest or call pospcase
, pospcase-at
or pospcase-read
repeatedly
on each start position (car (start . end))
of interested
S-expression and collect the result.
Structure the collected (start . end)
pairs in pospcase--matches
suitable for pospcase–iterator
consumption like this:
(((start . end) ; (match-string 1) of first (match-data)
(start . end)) ; (match-string 2) of first (match-data)
((start . end) ; (match-string 1) of second (match-data)
(start . end))) ; (match-string 2) of second (match-data)
pospcase–iterator
set car
of pospcase--matches
to match-data
using
set-match-data
.
Admittedly, pospcase-font-lock
do something very weird. Here, I’m
talking about submatchers. As you can see all of them calls
pospcase--call-iterator
macro. True to its name, the macro realize
the behavior of the iterator pattern (very crudely using a global
variable pospcase--matches
as the place holder for pre-collected
data.) I’m not very please with the implementation either. But I
think making lambda functions dynamically for each iterators,
managing and dispatching them correct for each call, is far
complexer than current implementation. And ultimately Emacs’s
font-lock (and jit-lock) is single-threaded. So I decided it
doesn’t worth the trouble to implement proper iterator.
You may ask why do you have to implement iterator in the first
place? Well, clearly Emacs’ font-lock.el was written with
regexp-based crawler like behavior in mind. So
font-lock-add-keywords
was designed accordingly. Lazy me just
don’t want to reimplement everything from scratch. Obviously I’m
misusing them. And this is why pospcase-font-lock
needs its weird
iterator.
The thing is, Emacs Lisp doesn’t have reader macro. In pospcase
context it means you can’t really use Emacs’s build-in reader
read-from-string
to parse Common Lisp’s S-expressions.
To circumvent and not really tackle the limitation,
pospcase--read-from-string
does quick hack using regexp to convert
unless unparsable S-expressions into Emacs Lisp counterpart as
smoothly as possible.
It’s simple text replacement rule. So don’t expect too much. If you experience a major problem you can’t think any way to circumvent, well, accept it as unparsable and give up the fancy highlighting for that section.
pospcase-font-lock
totally depends on pcase
. But it still use regexp
for searching heading keywords. The reason why I don’t use something
like el-search is I fear further degeneration of performance. And I
feel it’s overkill.
So far I have no use case for in-middle keyword matching. So it’s
not implemented. Purposely pospcase-font-lock
only supports heading
keyword patterns.
See defclass
syntax:
`(defclass ,name ,supers ,slots . ,_)
Where supers
is a list of super-classes, and slots
is a list of
class’s variables.
pospcase-font-lock
internally generates two keywords for defclass
equivalent of declaring the following:
(pospcase-font-lock 'lisp-mode
'(`(defclass ,name ,supers ,slots . ,_))
'((heading-keyword . (font-lock-keyword-face))
(name . (font-lock-type-face))
((slots . list/1) . (font-lock-variable-name-face))))
(pospcase-font-lock 'lisp-mode
'(`(defclass ,name ,supers ,slots . ,_))
'((heading-keyword . (font-lock-keyword-face))
(name . (font-lock-type-face))
((supers . list/1) . (font-lock-type-face))))
Note pospcase-font-lock
adds new keyword at the start of a keyword
list. In other word, the last added keyword will be highlighted
first and the highlighted text’s property fontified
is set to
t
. And since keywords are internally processed with append
flag,
the below highlighting is not going to be overwritten by the
following highlighting rules.