Skip to content

Alpha Code Generation Packages

Ryan Job edited this page Jun 13, 2024 · 2 revisions

This document describes the general design of the C code generator. These are found in the alpha.codegen project. Specifically, this document describes the alpha.codegen, alpha.codegen.isl, and alpha.codegen.alphaBase packages.

High-Level Overview

At a high level, generating C code is done in two phases:

  1. Convert the Alpha AST to a simplified C AST.
  2. Pretty print the C AST to the actual C code it represents.

This design allows disparate parts of the C AST to be generated in any order, avoiding issues of trying to write the C code file while walking the Alpha AST. This comes in handy, for example, when generating demand-driven code for a ReduceExpression, as a function which evaluates the reduction needs to be generated and put somewhere in the C code file (along with the declaration), but a call to that function needs to be emitted where the ReduceExpression was found.

Several different kinds of code generators are likely to be created, and there may be significant overlap between them all. However, it's possible that some code generators have no overlap. To account for this, the creation of the C AST and the navigation of the Alpha AST were split into two parts: the alpha.codegen package and the alpha.codegen.alphaBase respectively. This provides a separation of concerns so that, if a C code generator is needed that has no overlap with existing C code generators, we don't need to separate the two or duplicate the efforts.

  

The alpha.codegen Package

The alpha.codegen package (which is inside the alpha.codegen project) is meant to be the core functionality used for generating C programs. This package is self-contained (aside from a few utility functions out of other packages), and is intended to know absolutely nothing about isl, Alpha, or AlphaZ. Instead, it only models C code with a simplified AST, building up that AST from basic types (mostly strings and some enums), then printing it to a string.

Modeling C Code

The model/simpleC.xcore file defines the simplified model of C code. It doesn't support all features of C, mostly just what we need (notably, while loops are not included). Most nodes in the AST store information as strings, allowing us to insert code without worrying much about the structure. There is no type checking, and minimal name checking, providing us the flexibility to do what we need.

An important exception to this is the Program class. This was modeled specifically for how we (currently) generate C programs and was heavily based on the output of the original AlphaZ system. While this provides a very rigid structure for the kinds of programs we generate, it will (hopefully) meet our needs.

Most elements of the simplified C code are modeled as either a Statement or an Expression. A Statement is, effectively, anything that would be its own line of printed code inside a function. This roughly correlates with statements in C, but with some notable exceptions like comments, newlines, and macros. While these aren't statements in C, they take up a similar place, so they were modeled as statements here. An Expression is, effectively, anything that could go inside a statement. This is similar to expressions in C, but with some minor differences.

Since the Alpha program used as input has already been checked, and since some of the code we want to generate comes directly from isl, there is a CustomExpr class which is just a wrapper around a string. This expression simply prints as the exact string it was given. Additionally, if you need to insert an exact string in the place of a statement, you can put that string inside a CustomExpr and put that inside an ExpressionStmt (which is a statement that is just replaced by the wrapped expression and terminated by a semicolon).

It's worth noting that there is almost no safety checking in place anywhere, especially with the CustomExpr class. While this made it easier to write the generator (and hopefully makes it easier to generate C code), it makes it much easier to generate invalid C code. If you're creating your own code generator, be careful with this!

Generating the C AST

Most of the C AST generation can be done using the Factory class. This contains a bunch of static methods for constructing an AST node from raw data or other AST nodes. There are a few nodes missing (such as the Program class), but most of them are present.

For some more complex nodes, such as programs or functions, there are some builder classes. They all generally work in the same way. To begin building that node, you call the static start method. If there is anything that is absolutely needed for that node to exist (e.g., the conditional expression for an if statement), that must be provided to the start method. Then, there are a variety of add functions for adding specific kinds of elements. Once you're done building the node, you can get the node instance from the getInstance function. There are a few exceptions to this, but these exceptions should be documented in the header comment on the class.

For more information about the builder and factory design patterns, see:

Name Checking

There is some basic name checking perfomed by the NameChecker class. It keeps a set of reserved keywords and a set of names defined in the global scope. These are used to prevent any of these names from being re-defined, reducing the likelihood of generating invalid C code.

The name checker also has some helper functions for checking name uniqueness (and generating unique names) at the local scope. The set of names defined locally must be provided to these methods. There is a flag passed to the constructor, called preventShadowing, which disallows names in the local scope from being the same as something defined in the global scope. Note: the reverse is not checked.

When checking variable declarations in the local scope, the checker will look at both the name and data type for the declaration. Whenever the name matches, if the data type is also the same, error is returned, and the new declaration is not added to the given list of variable declarations. This was chosen as generating several loop nests via isl may result in some of the loop variables being reused, and I didn't want to force all loop variables in a function to be unique. Note: when attempting to declare a duplicate name with a different data type, an exception will be thrown.   

Printing the C AST

Printing the C AST is handled by the ProgramPrinter class. Printing statements and functions are handled by the printStmt and printExpr functions. Most other things are handled by the print functions. There are a few exceptions to this, but these are mainly used when there are multiple different ways to print a node (based on context), or as helper methods to simplify other methods.

The printer makes heavy use of Xtend's templating feature. You can read more about it on the following page:

https://eclipse.dev/Xtext/xtend/documentation/203_xtend_expressions.html#templates

The alpha.codegen.isl Package

This package is intended for converting isl objects into C AST nodes. Thus, this package is aware of alpha.codegen. However, the goal is that it should only know about that and isl. This again provides a separation of concerns, keeping the isl stuff separate from the base code generator or from anything involving Alpha/AlphaZ.

The alpha.codegen.alphaBase Package

Since most Alpha code generators will share much of the same structure, the common elements were extracted here. This comes in a few main parts:

  1. AlphaNameChecker: extends the base NameChecker to keep track of commonly used names when generating C code from Alpha code. For example, it can create or retrieve the name used to read from, or write to, an Alpha variable.
  2. TypeGeneratorBase: an abstract class intended for retrieving the C data type for an Alpha variable.
  3. ExprConverter: given an Alpha name checker and a type generator, it can convert most kinds of Alpha expressions into the C AST nodes you'd typically expect it to generate.
    • This works off a similar design pattern as the ProgramPrinter, using Xtend's dispatch methods to automatically handle different types of expressions.
    • You can inherit from this class and add more dispatch methods, and all the previous ones will still work, making it easy to add additional functionality (or replace previous functionality).
    • Notably, it does not support ReduceExpression, as different code generators may have vastly different ways of generating C code for the expression.
  4. CodeGeneratorBase: this handles most of the work for generating C code.

The CodeGeneratorBase Class

This is an abstract class with many implemented methods that handle the busywork of generating the C AST. This should be suitable for most (hopefully all) code generators in the AlphaZ system.

To get started on your own code generator:

  1. Create a new class that inherits from this class.
  2. Eclipse will underline the class name in red, as there are unimplemented methods and no constructor.
  3. Hover over that, then use the box that pops up to have Eclipse auto-generate these for you.

None of the methods (aside from convertSystemBody) return anything. Instead, they simply manipulate the program and entryPoint fields, which are a ProgramBuilder for the whole C program and a FunctionBuilder for the main entry point of the program. Since you don't need to return anything specific, you can simply manipulate these structures however you want. There's also a nameChecker (instance of AlphaNameChecker) and a typeGenerator (instance of TypeGeneratorBase) to help out.

There are a handful of functions you are required to override. However, if you don't actually need it, you can just make the function empty. Additionally, if you want to change some behavior that's specified in the CodeGeneratorBase, you can simply override that method.