Skip to content

Commit

Permalink
Apply fixes to the parser and printer
Browse files Browse the repository at this point in the history
  • Loading branch information
FlorianCassayre committed Apr 6, 2022
1 parent b90bb19 commit c4b154c
Show file tree
Hide file tree
Showing 10 changed files with 507 additions and 242 deletions.
100 changes: 54 additions & 46 deletions notes/documentation/parser.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,35 @@
Documentation: Parser
===

The package `parser` contains all the logic responsible for parsing
the LISA syntax, as produced by the printer (`lisa.kernel.Printer`).
The packages `parser` and `printer` contain all the logic responsible for
parsing and printing expression to/from the front.

* `SCReader.readFormulaAscii` -- reads a formula in ASCII. Examples:
* `FrontReader.readFormula` / `FrontPositionedPrinter.prettyFormula` -- read/parse a formula. Examples:
```
c
(a /\ b) => (c <=> d)
f(a, b)
exists x. !(x in {})
forall x, y. {x, y} = {y, x}
```
* `SCReader.readSequentAscii` -- reads a sequent in ASCII. Examples:
* `FrontReader.readSequent` / `FrontPositionedPrinter.prettySequent` -- read/parse a sequent. Examples:
```
|-
|- a
a; b |- c; d
exists x. g(x); a /\ b |- a; b
```
* `SCReader.readProof` -- reads a proof in its standard format. Example:
```
-1 Import 0 ⊢ ∀x, y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
0 Rewrite -1 ⊢ ∀x, y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
1 Subproof -1 ⊢ ∀y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
-1 Import 0 ⊢ ∀x, y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
0 Subproof ∀x, y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z) ⊢ ∀y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
0 Subproof ∀x, y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z) ⊢ ∀y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
0 Hypo. ∀y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z) ⊢ ∀y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
1 Left ∀ 0 ∀x, y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z) ⊢ ∀y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
1 Cut -1, 0 ⊢ ∀y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
2 Hypo. ∀z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z) ⊢ ∀z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
3 Left ∀ 2 ∀y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z) ⊢ ∀z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
4 Cut 1, 3 ⊢ ∀z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
5 Hypo. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z) ⊢ (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
6 Left ∀ 5 ∀z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z) ⊢ (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
7 Cut 4, 6 ⊢ (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
```

Additionally, it is possible to choose between ASCII and unicode.

### Pipeline

The parsing works in three steps:
* `SCLexer` transform a stream of characters into a stream of lexical tokens (`SCToken`)
* `SCParser` parses the tokens into a tree (`SCParsed`)
* `SCResolver` resolves a parsed tree into a kernel structure (`SCResolver`)

`SCReader` is the public interface and is responsible for calling the steps in sequence.

### Caveats

The "grammar" of LISA is not set in stone and may change in the future.

The printer does not yet support the `\x1, ..., xn. ...` syntax.

The string representation of proofs is ambiguous: by omitting the proof steps parameters, it is not
always possible to reconstruct the exact same proof step, or to reconstruct it at all.
* `FrontLexer` transform a stream of characters into a stream of lexical tokens (`FrontToken`)
* `FrontParser` parses the tokens into a tree (`FrontParsed`)
* `FrontResolver` resolves a parsed tree into trees of the front (`Term`, `Formula`, `Sequent`, ...)

These issues shall be addressed later when the parser is expected to play a more
important role.
`FrontReader` is the public interface and is responsible for calling the steps in sequence.

### Specification

Expand All @@ -68,22 +40,25 @@ string `<=>` but rather the token representing this symbol. The ASCII lexer will
the string `"<=>"` as this token, while the unicode one will parse ``.

```
id ::= [a-zA-Z_][a-zA-Z0-9_]*
sid ::= \?[a-zA-Z_][a-zA-Z0-9_]*
id ::= /[a-zA-Z_][a-zA-Z0-9_]*/
sid ::= /\?[a-zA-Z_][a-zA-Z0-9_]*/
cid ::= /\?\?[a-zA-Z_][a-zA-Z0-9_]*/
ol2 ::= "<=>" | "=>" | "\/" | "/\" | "in" | "sub" | "~" | "="
bl ::= "forall" | "exists" | "existsone"
t ::= t ol2 t
| "!" t
| bl id "." t
| "{}" | "{" t "}" | "{" t "," t "}"
| (id | sid) ("(" t ("," t)* ")")?
| (id | sid | cid) ("(" t ("," t)* ")")?
| "(" t ")"
tt ::= "\" id ("," id)* "." t
fv ::= "\" id ("," id)* "."
tts ::= (tt (";" tt)*)?
s ::= tts "|-" tts
ts ::= (t (";" t)*)?
s ::= fv? ts "|-" ts
ps ::= fv? ((... (";" t)*) | ts) |- (((t ";")* ...) | ts)
```

#### Associativity & precedence
Expand Down Expand Up @@ -111,7 +86,8 @@ The rules for resolution are given below. There are two types, terms (`T`) and f
Knowing the type of the top-level term can allow us to resolve the full tree into terms
and formulas. The last case (nullary function / variable) is resolved by looking in the
environment: if there is a variable wih that name in the context, then this symbol should be
a variable.
a variable. The printer ensures that this convention is respected (and may throw an exception
if an expression cannot be printed unambiguously).

```
C, x1, ..., xn |- a: F
Expand All @@ -130,6 +106,13 @@ C |- a: F C |- b: F
---------------------- [Connector(op, (a, b))]
C |- a op b: F (op is one of: /\, \/, =>, <=>)
C |- x1: F ... C |- xn: F
--------------------------- [Connector(??c, (a, b))]
C |- ??c(x1, ..., xn): F
----------- [Connector(??c, ())]
C |- ??c: F
C |- x1: T ... C |- xn: T
--------------------------- [Function((?)f, (x1, ..., xn))]
C |- (?)f(x1, ..., xn): T
Expand Down Expand Up @@ -159,7 +142,32 @@ Additional predefined constant operators have similar rules to the ones presente

In the future we may allow type ascriptions, as displayed in these rules.

#### Proofs
### Legacy LISA parser

The legacy parser was moved to `legacy.parser`. This tool can parse LISA proofs, with
some caveats.
Namely, the string representation of proofs is ambiguous: by omitting the proof steps
parameters, it is not always possible to reconstruct the exact same proof step, or to
reconstruct it at all.

* `SCReader.readProof` -- reads a proof in its standard format. Example:
```
-1 Import 0 ⊢ ∀x, y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
0 Rewrite -1 ⊢ ∀x, y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
1 Subproof -1 ⊢ ∀y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
-1 Import 0 ⊢ ∀x, y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
0 Subproof ∀x, y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z) ⊢ ∀y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
0 Subproof ∀x, y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z) ⊢ ∀y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
0 Hypo. ∀y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z) ⊢ ∀y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
1 Left ∀ 0 ∀x, y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z) ⊢ ∀y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
1 Cut -1, 0 ⊢ ∀y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
2 Hypo. ∀z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z) ⊢ ∀z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
3 Left ∀ 2 ∀y, z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z) ⊢ ∀z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
4 Cut 1, 3 ⊢ ∀z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
5 Hypo. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z) ⊢ (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
6 Left ∀ 5 ∀z. (z ∈ {x, y}) ↔ (x = z) ∨ (y = z) ⊢ (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
7 Cut 4, 6 ⊢ (z ∈ {x, y}) ↔ (x = z) ∨ (y = z)
```

A proof is internally represented as nested indexed sequences of steps.
Each line starts with an integer, representing the step's index. These indices
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package me.cassayre.florian.masterproject.front.parser
import me.cassayre.florian.masterproject.front.parser.FrontReadingException.LexingException
import me.cassayre.florian.masterproject.front.parser.FrontToken
import me.cassayre.florian.masterproject.front.parser.FrontToken.*
import me.cassayre.florian.masterproject.front.parser.FrontSymbols

import scala.util.matching.Regex
import scala.util.parsing.combinator.RegexParsers
Expand All @@ -12,18 +13,7 @@ private[parser] trait FrontLexer extends RegexParsers {
override def skipWhitespace: Boolean = true
override protected val whiteSpace: Regex = "[ \t\f]+".r

protected val SymbolForall: String
protected val SymbolExists: String
protected val SymbolExistsOne: String
protected val SymbolIff: String
protected val SymbolImplies: String
protected val SymbolOr: String
protected val SymbolAnd: String
protected val SymbolNot: String
protected val SymbolTurnstile: String
protected val SymbolSubset: String
protected val SymbolMembership: String
protected val SymbolEmptySet: String
protected val S: FrontSymbols

protected def initialIndentation: Parser[InitialIndentation] = positioned(
" *".r ^^ (str => InitialIndentation(str.length))
Expand All @@ -38,36 +28,40 @@ private[parser] trait FrontLexer extends RegexParsers {
identifierPattern.r ^^ (str => Identifier(str))
)
private def schematicIdentifier: Parser[SchematicIdentifier] = positioned(
(s"\\?$identifierPattern").r ^^ (str => SchematicIdentifier(str.tail))
(raw"\${S.QuestionMark}$identifierPattern").r ^^ (str => SchematicIdentifier(str.tail))
)
private def schematicConnectorIdentifier: Parser[SchematicConnectorIdentifier] = positioned(
(raw"\${S.QuestionMark}\${S.QuestionMark}$identifierPattern").r ^^ (str => SchematicConnectorIdentifier(str.tail.tail))
)

private def keywords: Parser[FrontToken] = positioned(
SymbolForall ^^^ Forall()
| SymbolExistsOne ^^^ ExistsOne()
| SymbolExists ^^^ Exists()
| SymbolIff ^^^ Iff()
| SymbolImplies ^^^ Implies()
| SymbolOr ^^^ Or()
| SymbolAnd ^^^ And()
| SymbolNot ^^^ Not()
| SymbolTurnstile ^^^ Turnstile()
| SymbolSubset ^^^ Subset()
| SymbolMembership ^^^ Membership()
| SymbolEmptySet ^^^ EmptySet()
| "=" ^^^ Equal()
| "~" ^^^ SameCardinality()
| "\\" ^^^ LocalBinder()
| "{" ^^^ CurlyBracketOpen()
| "}" ^^^ CurlyBracketClose()
| "(" ^^^ ParenthesisOpen()
| ")" ^^^ ParenthesisClose()
| "." ^^^ Dot()
| "," ^^^ Comma()
| ";" ^^^ Semicolon()
S.Forall ^^^ Forall()
| S.ExistsOne ^^^ ExistsOne()
| S.Exists ^^^ Exists()
| S.Iff ^^^ Iff()
| S.Implies ^^^ Implies()
| S.Or ^^^ Or()
| S.And ^^^ And()
| S.Exclamation ^^^ Not()
| S.Turnstile ^^^ Turnstile()
| S.Ellipsis ^^^ Ellipsis()
| S.Subset ^^^ Subset()
| S.Membership ^^^ Membership()
| S.EmptySet ^^^ EmptySet()
| S.Equal ^^^ Equal()
| S.Tilde ^^^ SameCardinality()
| S.Backslash ^^^ LocalBinder()
| S.CurlyBracketOpen ^^^ CurlyBracketOpen()
| S.CurlyBracketClose ^^^ CurlyBracketClose()
| S.ParenthesisOpen ^^^ ParenthesisOpen()
| S.ParenthesisClose ^^^ ParenthesisClose()
| S.Dot ^^^ Dot()
| S.Comma ^^^ Comma()
| S.Semicolon ^^^ Semicolon()
)

protected final def standardTokens: Parser[FrontToken] =
keywords | newLine | schematicIdentifier | identifier
keywords | newLine | schematicConnectorIdentifier | schematicIdentifier | identifier

// Order matters! Special keywords should be matched before identifiers
protected def tokens: Parser[Seq[FrontToken]] =
Expand All @@ -84,34 +78,12 @@ private[parser] trait FrontLexer extends RegexParsers {
object FrontLexer {

private trait FrontLexerAscii extends FrontLexer {
override protected val SymbolForall: String = "forall"
override protected val SymbolExists: String = "exists"
override protected val SymbolExistsOne: String = "existsone"
override protected val SymbolIff: String = "<=>"
override protected val SymbolImplies: String = "=>"
override protected val SymbolOr: String = """\/"""
override protected val SymbolAnd: String = """/\"""
override protected val SymbolNot: String = "!"
override protected val SymbolTurnstile: String = "|-"
override protected val SymbolMembership: String = "in"
override protected val SymbolSubset: String = "sub"
override protected val SymbolEmptySet: String = "empty"
override protected val S: FrontSymbols = FrontSymbols.FrontAsciiSymbols
}
private object FrontLexerStandardAscii extends FrontLexerAscii

private trait FrontLexerUnicode extends FrontLexer {
override protected val SymbolForall: String = ""
override protected val SymbolExists: String = ""
override protected val SymbolExistsOne: String = "∃!"
override protected val SymbolIff: String = ""
override protected val SymbolImplies: String = ""
override protected val SymbolOr: String = ""
override protected val SymbolAnd: String = ""
override protected val SymbolNot: String = "¬"
override protected val SymbolTurnstile: String = ""
override protected val SymbolMembership: String = ""
override protected val SymbolSubset: String = ""
override protected val SymbolEmptySet: String = ""
override protected val S: FrontSymbols = FrontSymbols.FrontUnicodeSymbols
}
private object FrontLexerStandardUnicode extends FrontLexerUnicode

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ sealed abstract class FrontParsed extends Positional

private[parser] object FrontParsed {

case class ParsedSequent(left: Seq[ParsedTopTermOrFormula], right: Seq[ParsedTopTermOrFormula]) extends FrontParsed
case class ParsedSequent(freeVariables: Seq[String], left: Seq[ParsedTermOrFormula], right: Seq[ParsedTermOrFormula]) extends FrontParsed
case class ParsedPartialSequent(freeVariables: Seq[String], left: Seq[ParsedTermOrFormula], right: Seq[ParsedTermOrFormula], partialLeft: Boolean, partialRight: Boolean) extends FrontParsed

case class ParsedTopTermOrFormula(freeVariables: Seq[String], termOrFormula: ParsedTermOrFormula) extends FrontParsed

Expand All @@ -16,7 +17,7 @@ private[parser] object FrontParsed {
val identifier: String
}
case class ParsedConstant(identifier: String) extends ParsedName
case class ParsedSchema(identifier: String) extends ParsedName
case class ParsedSchema(identifier: String, connector: Boolean) extends ParsedName

case class ParsedApplication(name: ParsedName, args: Seq[ParsedTermOrFormula]) extends ParsedTermOrFormula

Expand Down Expand Up @@ -56,8 +57,4 @@ private[parser] object FrontParsed {
case class ParsedExists(bound: Seq[String], termOrFormula: ParsedTermOrFormula) extends ParsedBinder
case class ParsedExistsOne(bound: Seq[String], termOrFormula: ParsedTermOrFormula) extends ParsedBinder

case class ParsedProofStep(stepPosition: Position, indentation: Int, line: Int, ruleName: String, premises: Seq[Int], conclusion: ParsedSequent) extends FrontParsed

case class ParsedProof(steps: IndexedSeq[ParsedProofStep]) extends FrontParsed

}
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,11 @@ private[parser] object FrontParser extends Parsers {
positioned(accept("identifier", { case id: Identifier => id }))
private def schematicIdentifier: Parser[SchematicIdentifier] =
positioned(accept("schematic identifier", { case id: SchematicIdentifier => id }))
private def schematicConnectorIdentifier: Parser[SchematicConnectorIdentifier] =
positioned(accept("schematic connector identifier", { case id: SchematicConnectorIdentifier => id }))

private def identifierOrSchematic: Parser[Identifier | SchematicIdentifier] =
positioned(identifier | schematicIdentifier)
private def identifierOrSchematic: Parser[Identifier | SchematicIdentifier | SchematicConnectorIdentifier] =
positioned((identifier: Parser[Identifier | SchematicIdentifier | SchematicConnectorIdentifier]) | schematicIdentifier | schematicConnectorIdentifier)

private def integerLiteral: Parser[IntegerLiteral] =
positioned(accept("integer literal", { case lit: IntegerLiteral => lit }))
Expand Down Expand Up @@ -69,7 +71,8 @@ private[parser] object FrontParser extends Parsers {
case v ~ argsOpt =>
val name = v match {
case Identifier(identifier) => ParsedConstant(identifier)
case SchematicIdentifier(identifier) => ParsedSchema(identifier)
case SchematicIdentifier(identifier) => ParsedSchema(identifier, connector = false)
case SchematicConnectorIdentifier(identifier) => ParsedSchema(identifier, connector = true)
}
argsOpt.map(ParsedApplication(name, _)).getOrElse(name)
}
Expand All @@ -90,26 +93,27 @@ private[parser] object FrontParser extends Parsers {

private def localBinder: Parser[Seq[String]] =
LocalBinder() ~> rep1sep(identifier, Comma()) <~ Dot() ^^ (fv => fv.map(_.identifier))
private def localBinderOptional: Parser[Seq[String]] = localBinder.? ^^ (fv => fv.getOrElse(Seq.empty))

private def topTermOrFormula: Parser[ParsedTopTermOrFormula] =
localBinder.? ~ termOrFormula ^^ { case fv ~ t => ParsedTopTermOrFormula(fv.getOrElse(Seq.empty), t) }
localBinderOptional ~ termOrFormula ^^ { case fv ~ t => ParsedTopTermOrFormula(fv, t) }

private def termOrFormulaSequence: Parser[Seq[ParsedTopTermOrFormula]] =
repsep(topTermOrFormula, Semicolon())
private def termOrFormulaSequence: Parser[Seq[ParsedTermOrFormula]] =
repsep(termOrFormula, Semicolon())

private def sequent: Parser[ParsedSequent] =
positioned(termOrFormulaSequence ~ Turnstile() ~ termOrFormulaSequence ^^ { case l ~ _ ~ r => ParsedSequent(l, r) })

private def proofStep: Parser[ParsedProofStep] = positioned(
indentation ~ integerLiteral ~ ruleName ~ repsep(integerLiteral, Comma()) ~ sequent ^^ {
case i ~ l ~ r ~ p ~ s => ParsedProofStep(l.pos, i.spaces, l.value, r.name, p.map(_.value), s)
}
)

private def proof: Parser[ParsedProof] = positioned(
(indentation ~ newLine).* ~> rep1sep(proofStep, newLine) <~ (newLine ~ indentation).* ^^ (steps => ParsedProof(steps.toIndexedSeq))
)

positioned(localBinderOptional ~ termOrFormulaSequence ~ Turnstile() ~ termOrFormulaSequence ^^ { case fv ~ l ~ _ ~ r => ParsedSequent(fv, l, r) })

private def partialSequent: Parser[ParsedPartialSequent] =
positioned(localBinderOptional ~ (
((Ellipsis() ~> (Semicolon() ~> rep1sep(termOrFormula, Semicolon())).?) ^^ (opt => (opt.getOrElse(Seq.empty), true))) |
termOrFormulaSequence ^^ (seq => (seq, false))
) ~ Turnstile() ~
((Ellipsis() ^^^ Seq.empty | (rep1sep(termOrFormula, Semicolon()) <~ (Semicolon() ~ Ellipsis()))) ^^ (seq => (seq, true)) |
termOrFormulaSequence ^^ (seq => (seq, false))
) ^^ {
case fv ~ (l, pl) ~ _ ~ (r, pr) => ParsedPartialSequent(fv, l, r, pl, pr)
})

private def parse[T](parser: Parser[T])(tokens: Seq[FrontToken]): T = {
val reader = new FrontTokensReader(tokens)
Expand All @@ -129,7 +133,6 @@ private[parser] object FrontParser extends Parsers {
def parseSequent(tokens: Seq[FrontToken]): ParsedSequent =
parse(positioned(sequent <~ End()))(tokens)

def parseProof(tokens: Seq[FrontToken]): ParsedProof =
parse(positioned(proof <~ End()))(tokens)

def parsePartialSequent(tokens: Seq[FrontToken]): ParsedPartialSequent =
parse(positioned(partialSequent <~ End()))(tokens)
}
Loading

0 comments on commit c4b154c

Please sign in to comment.