forked from prataprc/goparsec
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdoc.go
119 lines (96 loc) · 5.1 KB
/
doc.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
/*
Package parsec provides a library of parser-combinators. The basic
idea behind parsec module is that, it allows programmers to compose
basic set of terminal parsers, a.k.a tokenizers and compose them
together as a tree of parsers, using combinators like: And,
OrdChoice, Kleene, Many, Maybe.
To begin with there are four basic Types that needs to be kept in
mind while creating and composing parsers,
Types
Scanner, an interface type that encapsulates the input text. A built
in scanner called SimpleScanner is supplied along with this package.
Developers can also implement their own scanner types. Following
example create a new instance of SimpleScanner, using an input
text:
var exprText = []byte(`4 + 123 + 23 + 67 +89 + 87 *78`)
s := parsec.NewScanner(exprText)
Nodify, callback function is supplied while combining parser
functions. If the underlying parsing logic matches with i/p text,
then callback will be dispatched with list of matching ParsecNode.
Value returned by callback function will further be used as
ParsecNode item in higher-level list of ParsecNodes.
Parser, simple parsers are functions that matches i/p text for
specific patterns. Simple parsers can be combined using one of the
supplied combinators to construct a higher level parser. A parser
function takes a Scanner object and applies the underlying parsing
logic, if underlying logic succeeds Nodify callback is dispatched
and a ParsecNode and a new Scanner object (with its cursor moved
forward) is returned. If parser fails to match, it shall return
the input scanner object as it is, along with nil ParsecNode.
ParsecNode, an interface type encapsulates one or more tokens from
i/p text, as terminal node or non-terminal node.
Combinators
If input text is going to be a single token like `10` or `true` or
`"some string"`, then all we need is a single Parser function that
can tokenize the i/p text into a terminal node. But our applications
are seldom that simple. Almost all the time we need to parse the i/p
text for more than one tokens and most of the time we need to
compose them into a tree of terminal and non-terminal nodes.
This is where combinators are useful. Package provides a set of
combinators to help combine terminal parsers into higher level
parsers. They are,
* And, to combine a sequence of terminals and non-terminal parsers.
* OrdChoice, to choose between specified list of parsers.
* Kleene, to repeat the parser zero or more times.
* Many, to repeat the parser one or more times.
* ManyUntil, to repeat the parser until a specified end matcher.
* Maybe, to apply the parser once or none.
All the above mentioned combinators accept one or more parser function
as arguments, either by value or by reference. The reason for allowing
parser argument by reference is to be able to define recursive
parsing logic, like parsing nested arrays:
var Y Parser
var value Parser // circular rats
var opensqrt = Atom("[", "OPENSQRT")
var closesqrt = Atom("]", "CLOSESQRT")
var values = Kleene(nil, &value, Atom(",", "COMMA"))
var array = And(nil, opensqrt, values, closeSqrt)
func init() {
value = parsec.OrdChoice(nil, Int(), Bool(), String(), array)
Y = parsec.OrdChoice(nil, value)
}
Terminal parsers
Parsers for standard set of tokens are supplied along with this package.
Most of these parsers return Terminal type as ParseNode.
* Char, match a single character skipping leading whitespace.
* Float, match a float literal skipping leading whitespace.
* Hex, match a hexadecimal literal skipping leading whitespace.
* Int, match a decimal number literal skipping leading whitespace.
* Oct, match a octal number literal skipping leading whitespace.
* String, match a string literal skipping leading whitespace.
* Ident, match a identifier token skipping leading whitespace.
* Atom, match a single atom skipping leading whitespace.
* AtomExact, match a single atom without skipping leading whitespace.
* Token, match a single token skipping leading whitespace.
* TokenExact, match a single token without skipping leading whitespace.
* OrdToken, match a single token with specified list of alternatives.
* End, match end of text.
* NoEnd, match not an end of text.
All of the terminal parsers, except End and NoEnd return Terminal type
as ParsecNode. While End and NoEnd return a boolean type as ParsecNode.
AST and Queryable
This is an experimental feature to use CSS like selectors for quering
an Abstract Syntax Tree (AST). Types, APIs and methods associated with
AST and Queryable are unstable, and are expected to change in future.
While Scanner, Parser, ParsecNode types are re-used in AST and Queryable,
combinator functions are re-implemented as AST methods. Similarly type
ASTNodify is to be used instead of Nodify type. Otherwise all the
parsec techniques mentioned above are equally applicable on AST.
Additionally, following points are worth noting while using AST,
* Combinator methods supplied via AST can be named.
* All combinators from AST object will create and return NonTerminal
as the Queryable type.
* ASTNodify function can interpret its Queryable argument and return
a different type implementing Queryable interface.
*/
package parsec