logo

Intermediate Representations

Posted on: 2023-01-01

Intermediate Representations

Intermediate Representations (IR) are in a sense the meat of a compiler, the front end and backend in the analogy being thin slices of bread. Compilers can have many different IRs, and it can be argued that the AST is just a kind of IR.

Having separate IRs for different stages of compilation can have advantages. The IR at each level can be tailored to the specific needs of the level. Once the transformation is made, it could be a requirement that it conforms to some invariants, or better the representation cannot represent invalid scenarios.

On the other hand having many IRs requires many transformations between different types that may convey no extra advantage. Invariants can be enforced by different stages via functions checking they hold, or by having instructions that are only available in different stages. Having many different IRs also implies either limits or complexity around how they can be used. If there is a dumping or serialization capability it may need to be written multiple times. Typically there is a lot of overlap in different stages, so there is lots of repeated types and concepts.

Other Implementations

MLIR

Some observations

Some observations around the "Pass manager" as discussed in the video

Links

Swift IR aka SIL

Has flow sensitive lowering. This makes the observation that x = y can mean initialize x (if it is undefined) or replace. If the type is reference counted, that implies quite different code paths.

Has this idea of allocating variables in boxes on the heap. The idea there being that if those variables are then used as part of a closure, things are simple, in so far as the variables can outlive the stack frame. Of course if the variable isn't used in a closure that is going to be horribly slow. So the compiler has passes to determine how the variable is used. If the variable can be determined to only be read in the closure, it can be copied as a new value. Once that happens if the variable no longer needs to be on the heap, and so can be converted to being stack allocated, or held in registers. The observation here is capturing the semantics desired, and then applying multiple (manditory) passes can get complex abstractions and performance.

There was a discussion around the advantages of the conditional fail instruction, which the SIL team thought worked out really well for them.

LLVM

SPIR-V

Has 16 bit op code, 16 bit inst size held in first word.

Has the concept of "extended" instructions, via OpExtInst. This specifies an import a and the name. The name appears as if it is an index. The meaning of the values can be found in the registry

Links