L7: Instruction Scheduling

Instruction Scheduling

How to execute real world programs faster, even with data dependencies. Tomasulo’s algorithm was and still is one of the most efficient ways to take the theory of ILP and physically recreate it in hardware.

Improving IPC

Using out-of-order processing to obtain >1 IPC

We will analyze how to actually impliment register renaming and out-of-order execution

Tomasulo’s Algorithm

Goals:

  1. Determine which instructions have inputs ready
  2. Register renaming

Originally it was used for floating point instructions, now it’s used for everything.

Parts:

Instruction Queue: Instructions are fetched and put into a queue. FIFO, so they are in instruction order which allows renaming to work appropriately.

Reservation Station: Instructions are issued from the queue to the reservation stations. Usually these have a limiting purpose (ADD/SUB, MUL/DIV). It is in this step that the renaming via the RAT happens, so that instructions can wait for results to be ready if they have true dependencies.

Registers: When an instruction is put in the reservation station, if the values are ready, they are placed in the instruction in the reservation station.

Execution Unit: Adder, Multiplier, etc. Instructions are dispatched from the associated reservation station. Computes the instruction then broadcasts via a BUS the result to the register file, the reservation stations and the RAT. This allows instructions that are waiting for their values to know that they are free to execute.

If the instruction is a load or store:

  1. Goes to the address generation unit to compute the address
  2. Insert into the load buffer or store buffer
  3. Queued up to go into memory
  4. When value comes back from memory, it is also broadcast on the bus.
  5. Also the storebuffer is the recipiant of the broadcasts so that if it is true dependent it can get it’s values when they become available.

Issue

Dispatch

Broadcast

If one bus, the slower unit result is broadcast in priority. If stale result is broadcast (aka RAT doesn’t contain RS# for result), it still latches to RS, but otherwise does nothing.

Per cycle:

Same cycle: Issue -> Dispatch? No Capture -> Dispatch? No Update RAT for Issue & Write Result? Yes but Issue updates RAT last

Load and Store

In Tomasulo’s done in order to avoid dependency problems