PyRD is a recursive descent parser generator with backtracking implemented in Python.
PyRD can be installed by running python setup.py install
from the top level of the repository.
PyRD can be invoked as either a python module with python -m pyrd <input.grammar> <output.py>
or as a standalone program with pyrdg <input.grammar> <output.py>
. The syntax of <input.grammar>
is described in the next section, and <output.py>
will be generated as an importable Python file.
.grammar
files consist of 2 sections, separated by the marker %%
. The first section consists of a
list of the parser's production rules, and the second section consists of raw Python code that will be
appended to the end of the generated parser. Prepending to the generated parser is currently not supported.
Production rules are specified as follows:
production_id :: list of space-separated subparsers {raw python code}
| another list of subparsers {raw python code}
| more subparsers {raw python code};
Production rules are tried sequentially, so if the first rule belonging to an id succeeds the second will not
be run. The values returned by any subparser in a production rule are populated by name into that
production rule's python code, eg:
list_index :: list_ "[" index "]" { return list_[index] };
Two special parsers for string literals and regexes can be accessed with
"characters-to-match"
and /regex-to-match/
, respectively. Their results can be accessed by indexing the
special variable parsed
in their production rule's python code,
eg:
int_in_brackets :: "[" /-?[0-9]+/ "]" {return int(parsed[1])};
The python code between brackets is run whenever the production rule succeeds, and is used as that parser's value
when called from another production rule. It is substituted as-is into the generated parser, so care must be taken to
comply with Python's indentation rules.
An example parser that finds the maximum or minimum of a comma-separated list of integers might look like:
list :: operator "(" items ")" {return ops[operator](items[::-1])};
operator :: /(min|max)/ {return parsed[0]};
items :: num "," items {items.append(num); return items}
| num {return [num]};
num :: /-?[0-9]+/ {return int(parsed[0])};
%%
ops = {
"min":min,
"max":max
}
if __name__ == '__main__':
parsed = List().parse(input())
print(parsed)
The complete description for .grammar
files can be found in examples/grammar.grammar
The examples/
directory contains several example grammars for PyRD. Examples can be compiled and tested
as follows:
$ pyrdg examples/calc.grammar calc.py
Grammar parsed successfully.
$ echo "1 + 2 * 3" | python calc.py
7
PyRD does not generate efficient parsers. The following parse times were achieved with a json
parser generated by PyRD (examples/json.grammar
):
File Size (KB) | Runtime (PyRD) | Runtime (builtin JSON parser) |
---|---|---|
10 | 0.120s | < 0.1s |
30 | 0.257s | < 0.1s |
70 | 0.548s | < 0.1s |
140 | 1.193s | < 0.1s |
400 | 5.521s | < 0.1s |
1000 | 44.8s | 0.1s |
- Support more flexible naming of production rule ids.
- Document PyRD Classes.
- Improve runtime by implementing some form of LL(k) prediction.