Skip to content

Latest commit

 

History

History
146 lines (117 loc) · 6.05 KB

basil-ir.md

File metadata and controls

146 lines (117 loc) · 6.05 KB

BASIL IR

BASIL IR is the intermediate representation used during static analysis. This is on contrast to Boogie IR which is used for specification annotation, and output to textual boogie syntax that can be run through the Boogie verifier.

The grammar is described below, note that the IR is a data-structure, without a concrete textual representation so the below grammar only represents the structure. We omit the full description of the expression language because it is relatively standard.

The IR has a completely standard simple type system that is enforced at construction.

$$\begin{align*} Program ::=&~ Procedure* \\\ Procedure ::=&~ (name: ProcID) (entryBlock: Block) (returnBlock: Block) (blocks: Block*) \\\ &~ \text{Where }entryBlock, returnBlock \in blocks \\\ Block_1 ::=&~ BlockID \; Statement*\; Call? \; Jump \; \\\ Block_2 ::=&~ BlockID \; (Statement | Call)*\; Jump \; \\\ \\\ &~ Block = Block_1 \text{ is a structural invariant that holds during all the early analysis/transform stages} \\\ Statement ::=&~ MemoryAssign ~|~ LocalAssign ~|~ Assume ~|~ Assert ~|~ NOP \\\ ProcID ::=&~ String \\\ BlockID ::=&~ String \\\ \\\ Jump ::=&~ GoTo ~|~ Unreachable ~|~ Return \\\ GoTo ::=&~ \text{goto } BlockID* \\\ Call ::=&~ DirectCall ~|~ IndirectCall \\\ DirectCall ::=&~ \text{call } ProcID \\\ IndirectCall ::=&~ \text{call } Expr \\\ \\\ &~ loads(e: Expr) = \{x | x:MemoryLoad, x \in e \} \\\ \\\ MemoryAssign ::=&~ MemoryAssign (mem: Memory) (addr: Expr) (val: Expr) (Endian) (size: Int) \\\ &\text {Such that } loads(addr) = loads(val) = \emptyset \\\ \\\ LocalAssign ::=&~ Variable := Expr \\\ Assume ::=&~ \text{assume } body:Expr\\\ &\text {Such that } loads(body) = \emptyset \\\ Assert ::=&~ \text{assert } body:Expr\\\ &\text {Such that } loads(body) = \emptyset \\\ \\\ Expr ::=&~ MemoryLoadExpr ~|~ Variable ~|~ Literal ~|~ Extract ~|~ Repeat \\\ &~ ~|~ ZeroExtend ~|~ SignExtend ~|~ UnaryExpr ~|~ BinaryExpr \\\ Variable ::=&~ Global ~|~ LocalVar \\\ MemoryLoadExpr ::=&~ MemoryLoad (mem:Memory) (addr: Expr) (Endian) (size: Int) \\\ &\text {Such that } loads(addr) = \emptyset \\\ Memory ::=&~ Stack ~|~ Mem \\\ Endian ::=&~ BigEndian ~|~ LittleEndian \\\ \end{align*}$$
  • The GoTo jump is a multi-target jump reprsenting non-deterministic choice between its targets. Conditional structures are represented by these with a guard (an assume statement) beginning each target.
  • The Unreachable jump is used to signify the absence of successors, it has the semantics of assume false.
  • The Return jump passes control to the calling function, often this is over-approximated to all functions which call the statement's parent procedure.

Translation Phases

IR With Returns

  • Immediately after loading the IR return statements may appear in any block, or may be represented by indirect calls. The transform pass below replaces all calls to the link register (R30) with return statements. In the future, more proof is required to implement this soundly.
cilvisitor.visit_prog(transforms.ReplaceReturns(), ctx.program)
transforms.addReturnBlocks(ctx.program, true) // add return to all blocks because IDE solver expects it
cilvisitor.visit_prog(transforms.ConvertSingleReturn(), ctx.program)

This ensures that all returning, non-stub procedures have exactly one return statement residing in their returnBlock.

Calls appear only as the last statement in a block

  • The structure of the IR allows a call may appear anywhere in the block but for all the analysis passes we hold the invariant that it only appears as the last statement. This is checked with the function singleCallBlockEnd(p: Program). And it means for any call statement c we may assert(c.parent.statements.lastOption.contains(c)).

Interaction with BASIL IR

Constructing Programs in Code

The 'DSL' is a set of convenience functions for constructing correct IR programs in Scala source files. This provides a simple way to construct IR programs for use in unit tests. Its source code can be found here.

An example can be seen below:

var program: Program = prog(
  proc("main",
    block("first_call",
      Assign(R0, bv64(1), None)
      Assign(R1, bv64(1), None)
      directCall("callee1"),
      goto("second_call"))
    ),
    block("second_call",
      directCall("callee2"),
      goto("returnBlock")
    ),
    block("returnBlock",
      ret
    )
  ),
  // ... other procedures
)

As we can see, the syntax is

program     ::= prog ( procedure+ )
procedure   ::= proc (procname, block+)
block       ::= block(blocklabel, statement+, jump)
statement   ::= <BASIL IR Statement>
jump        ::= goto_s | ret | unreachable
call_s      ::= directCall (procedurename, None | Some(blocklabel))  // target, fallthrough 
goto_s      ::= goto(blocklabel+)                              // targets
procname    ::= String
blocklabel  ::= String

If a block or procedure name is referenced in a target position, but a block or procedure is not defined with that label, the dsl constructor will likely throw a match error.

Some additional constants are defined for convenience, Eg. R0 = Register(R0, 64), see the source file for the full list.

Static Analysis / Abstract Interpretation

  • For static analysis the Il-CFG-Iterator is the current well-supported way to iterate the IR. This currently uses the TIP framework, so you do not need to interact with the IR visitor directly. See BasicIRConstProp.scala for an example on its useage.
  • This visits all procedures, blocks and statements in the IR program.

Modifying and Visiting the IR with Visitor Pattern

src/main/scala/ir/Visitor.scala defines visitors which can be used for extracting specific features from an IR program. This is useful if you want to modify all instances of a specific IR construct.

CFG

The cfg is a control-flow graph constructed from the IR, it wraps each statement in a Node.