Metrics and Evaluation

Performance

Latency is how long from start -> finish. Throughput is # of processes/second.

Comparing throughput is done via speedup = N:

N = throughput(x)/throughput(y)

whereas latency is:

N = latency(y)/latency(x)

In both instances we get “x is N times faster than y”

While performance is directly proportional to throughput, it is proportional to 1/latency

Measuring Performance

Performance = 1/execution time

However, execution time depends on user load

this depends on user
requires many programs that could be run
user would have to willingly give performance data

Benchmarks are programs and user data agreed upon for performance measurement. Usually part of a benchmark suite to simlate different types of apps running on the hardware.

Real applications
- most representitave of realworld use
- difficult to set up on new machine (because incomplete)
- for actual machines
Kernels
- find most time consuming part of app
- still can be too difficult on new machine (might not have compiler)
- for prototypes
Synthetic Benchmarks
- behave like kernals but purposefully easier to compile
- good for design studies
Peak Performance
- theoretical maximums (this is for marketing)

Benchmark Standards are set by independent organizations (made up of user groups, academics and manufacturers)

TPC (DB, webservers, transaction processing)
EEMBC (embedding processing)
SPEC (raw processesors)
- GCC (software dev workloads)
- BWAVES, LBM (fluid dynamics)
- PERL (string processing)
- CACTUS ADM (physics/relativity)
- XALANC BMK (XML)
- CALCULIX, DEALL (diff eq)
- BZIP (compression)
- GO, SJENG (gaming)

Summarizing performance is done by averaging performance accross different applications. Not average speedup! Geometric means can be used on values and speed up.

Iron Law of Performance

CPU TIME = # instructions in program * cycles per instruction * clock cycle time

We use these three metrics to calculate CPU time because it allows us to analyze different, important areas that each affect performance. Number of instructions in program is affected by the algorithm choice as well the compiler choice. Instruction set is also a factor. If the instructions are too simple it will take more of them to execute the same thing. Instruction set also affects the cycles, where simpler is a good thing that leads to fewer cycles. Processor design is the other factor for cycles. It also affects clock cycle time. Circuit design and transistor physics are the two other factors limiting clock cycle time.

Computer Architecture is affected by instruction sets and processor design.

Unequal instruction times:

Changes # instructions in program * cycles per instruction to be the sum of all the different types of instructions to get the total number of cycles.

CPU TIME = SIGMA_i (IC_i * CPI_i) * TIME/CYCLE

Amdahl’s Law

Used for speedup of part of the program or only some of the instructions and we want to know what the overall speedup is.

SPEEDUP = 1 / ((1 - Frac_Enh) + (Frac_Enh/speedupAmnt))

Where Frac_Enh = % of original execution **TIME**

Better to get small improvements on large portions of time rather than large improvements on small portions. e.g. make common case fast

Lhadma’s Law

Don’t mess up uncommon case too badly while making improvements on common case.