Skip to content

IR Design ‐ Instructions

Dibyendu Majumdar edited this page Jun 7, 2025 · 12 revisions

This is part of the series that discusses what you must think about when deciding how to represent IRs.

Linear IR

The discussion here is primarily for linear Intermediate Representations. By linear we mean that the instructions are organized into sequences that are logically executed one after the other. This is in contrast to graph IRs where instructions are organized as a graph and the order of execution is not sufficiently defined by the graph.

Organize Instructions Into Basic Blocks

For many algorithms we need to operate on the control flow graph. It is important to organize the instructions into Basic Blocks, where each BB encapsulates instructions that are executed sequentially. The final instruction in the BB must a jump to some other BB.

public class BasicBlock {
    List<Instruction> instructions;
}

Control Flow Graph

The Basic Blocks should contain links to predecessor and successor BBs - this forms the Control Flow graph. The links are driven by the jump instructions that terminate each BB.

public class BasicBlock {
    List<Instruction> instructions;
    List<BasicBlock> successors;
    List<BasicBlock> predecessors;
}

Instruction Operands

Instructions deal with values, some of those may be Virtual Registers, some Constants, and even Basic Blocks, for example in jump Instructions. We therefore create a type called Operand that represents values handled by an Instruction. Operands can have different subtypes in an OO design (or an Operand can be a union or sumtype in non-OO languages) - where each subtype is for a specific type of value.

class Operand {}
class RegisterOperand extends Operand {}
class ConstantOperand extends Operand {}

Instructions Define and Use Registers

From the optimization point of view, it is important to model the Instruction in such a way that we can easily handle following requirements:

  • Does the instruction define a Register - i.e. assigns a value to it?
  • Does it use Registers defined by other instructions
  • Replace a register that is defined
  • Replace a specific use of a Register - or replace all uses - a Register may be replaced by another Register or by something else such as a Constant.

Whether an Instruction defines a Register and/or uses Values including Registers depends on the Instruction type. So each Instruction type sets its own requirements, however, all Instructions satisfy the interface requirements above, so that the optimizer can treat each instruction uniformly.

class Instruction {
    // Does this instruction define a value?
    public boolean definesVar();
    // Get the defined value
    public Register def();
    // Get register uses
    public List<Register> uses();
    // Replace register that is defined
    public void replaceDef(Register newReg);
    // Replace uses
    public void replaceUses(Register[] newUses);
    // Replace specific use
    public boolean replaceUse(Register oldUse, Register newUse);
    // Replace specific register with constant
    public void replaceWithConstant(Register register, ConstantOperand constantOperand);
}

Phis Are Special

Phi instructions also define a Register and use one or more Registers, but Phis do not have the same semantics as other instructions in many situations, such as when computing Liveness. So a consideration is how to model Phi instructions, whether they obey the general interface or have their own specialized interface. In EeZee language we give Phi instructions their own interface, which is similar to the general instruction but ensures that Phis cannot be erroneously manipulated by the optimizer. This has a trade off because there are scenarios where Phis do not require the special treatment.

class Phi extends Instruction {

        // def = value
        public Register value();
        public void replaceValue(Register newReg);

        // input = use
        public int numInputs();
        public Operand input(int i);
        public Register inputAsRegister(int i);
        public boolean isRegisterInput(int i);
        public Register[] inputRegisters();
        public void replaceInput(int i, Register newReg);
        public void removeInput(int i);

}

Instructions Can Be Replaced

The optimizer must be able to:

  • Identify an instruction
  • Replace it with a new Instruction

Thus every Instruction must have an identity. It is also helpful for an Instruction to know which BB it lives in.

Define Data Structures

Compiler must be able to operate and manipulate BBs, Instructions, Operands and Registers - so it is important to model these as data structures, and not encode them into a serialized form until the end. Some languages such as Lua directly encode instructions - this is only suitable for single pass compilers that do not do optimizations.

Def Use Chains

Def use chains link Instructions to Registers - in SSA for, each Register can have a list of Instructions that use it. In non-SSA form, a Register may have multiple definitions and each definition its own set of Uses. Think how this will be handled.

Walking Basic Blocks

Basic Blocks may need to be traversed in specific order such as Reverse Post Order (RPO). To support this auxiliary information has to be recorded for each Basic Block.

Dominator Trees and Dominance Frontiers

The optimizer will most likely need to compute and maintain Dominator Trees and Dominance Frontiers. The algorithms that compute these and the data structures for maintaining this information will typically be attached to Basic Blocks.

It is useful for Basic Blocks to be assigned a unique ID from a densely packed sequence of integers, this makes for a good way to generate names for the BBs, but also can be used to key into auxiliary data structures.

public class BasicBlock {
    public final int bid;

    /**
     * The preorder traversal number, also acts as a flag indicating whether the
     * BB is yet to be visited (_pre==0 means not yet visited).
     */
    int pre;
    /**
     * The depth of the BB in the dominator tree
     */
    int domDepth;
    /**
     * Reverse post order traversal number;
     * Sort node list in ascending order by this to traverse graph in reverse post order.
     * In RPO order if an edge exists from A to B we visit A followed by B, but cycles have to
     * be dealt with in another way.
     */
    int rpo;
    /**
     * Immediate dominator is the closest strict dominator.
     */
    public BasicBlock idom;
    /**
     * Nodes for whom this node is the immediate dominator,
     * thus the dominator tree.
     */
    public List<BasicBlock> dominatedChildren;
    /**
     * Dominance frontier
     */
    public Set<BasicBlock> dominationFrontier;
}
Clone this wiki locally