Intermediate Representations

Posted on: 2023-01-01

Indexed IR

Intermediate Representations

Intermediate Representations (IR) are in a sense the meat of a compiler, the front end and backend in the analogy being thin slices of bread. Compilers can have many different IRs, and it can be argued that the AST is just a kind of IR.

Having separate IRs for different stages of compilation can have advantages. The IR at each level can be tailored to the specific needs of the level. Once the transformation is made, it could be a requirement that it conforms to some invariants, or better the representation cannot represent invalid scenarios.

On the other hand having many IRs requires many transformations between different types that may convey no extra advantage. Invariants can be enforced by different stages via functions checking they hold, or by having instructions that are only available in different stages. Having many different IRs also implies either limits or complexity around how they can be used. If there is a dumping or serialization capability it may need to be written multiple times. Typically there is a lot of overlap in different stages, so there is lots of repeated types and concepts.

Other Implementations

MLIR

YouTube: MLIR Internals

Some observations

Yikes! This uses all kinds of crazy stuff
- Packing bits in low parts of pointers
- Using memory before an op to store return information
- Dictionaries/contexts/uniquifiers...
- A bump pointer, that potentially consumes even to do a lookup
The description, apparently using python probably worth looking at a little more
It is worth describing more about what an Id is referencing
- Say instead of having InstId having InstId<Type>, to mark it is an id to a type
- In the output type this might be dropped, but can be used for validation via reflection
- It documents in code what is appropriate
- It encourages more attention to what valid IR is
Does have thought into extensibility and seems to have something like Ext concept
- This was used in a previous IR as an "escape hatch" to "external" instructions
Properties - you can have normal classes such as part of operation, shared_ptr, vectors(!).
YouTube: MLIR Actions: Tracing and Debugging MLIR-based Compilers

Some observations around the "Pass manager" as discussed in the video

Allows varying which passes are executed to try and determine which pass causes a problem
Allows seeing where time goes between passes
Their passes are multi-threaded
Easy to to set an option to output IR as processing
Allows injection such that some code is run in between each IR pass
Can test if something changed in a pass (create a hash before and after for example), and only output on change

Links

Operation Definition Specification (ODS)

Swift IR aka SIL

Has flow sensitive lowering. This makes the observation that x = y can mean initialize x (if it is undefined) or replace. If the type is reference counted, that implies quite different code paths.

Has this idea of allocating variables in boxes on the heap. The idea there being that if those variables are then used as part of a closure, things are simple, in so far as the variables can outlive the stack frame. Of course if the variable isn't used in a closure that is going to be horribly slow. So the compiler has passes to determine how the variable is used. If the variable can be determined to only be read in the closure, it can be copied as a new value. Once that happens if the variable no longer needs to be on the heap, and so can be converted to being stack allocated, or held in registers. The observation here is capturing the semantics desired, and then applying multiple (manditory) passes can get complex abstractions and performance.

There was a discussion around the advantages of the conditional fail instruction, which the SIL team thought worked out really well for them.

LLVM

SPIR-V

Has 16 bit op code, 16 bit inst size held in first word.

Has the concept of "extended" instructions, via OpExtInst. This specifies an import a and the name. The name appears as if it is an index. The meaning of the values can be found in the registry