L11: VLIW

Very Long Instruction Word

Superscalar vs VLIW

	OOO Superscalar	In-Order Superscalar	VLIW
IPC	<=N	<=N	1 Large Inst some work as N “normal” insts
How do we find indep. inst.	Look at >N insts	Look at next N insts in program order	Just do next large inst
Hardware cost	Expensive	Less Expensive	Even Less Expensive
Help from compiler	Compiler can help	Needs help	Completely depends on compiler

VLIW: The Good and the Bad

Good:

Compiler does the hard work
Simpler Hardware
Can be energy efficient
Works well on loops and “regular” code

Bad:

Latencies are not always the same
Irregular Applications
Code Bloat
- In the case of dependent instructions in OoO processors, VLIW has to issue no ops which is wasted space per instruction
- Can lead to 4x program size

VLIW Instructions

Has all the “normal” ISA opcodes
Full predication support
Lots of registers
Branch hints from compiler to aid in BPred
VLIW instruction “compaction”
- Adding stops between instructions

VLIW Examples

Intel Itanium
- Tons of ISA features
- HW very complicated
- Still not great on irregular code
DPS Processors
- Regular loops, lots of iterations
- Excellent performance
- Very Energy-Efficient

Note: good for adding -> Counting -> figuring out stuff