Title: Run-Jump-Run: Bouquet of Instruction PointerJumpers for High Performance InstructionPrefetching

Abstract:
Large instruction working sets are common with modern client and server workloads. These working sets often fit in the large last-level cache (LLC). However, the L1 instruction cache (L1-I) suffers from a high miss rate blocking the instruction supply to the front-end of the processor. Instruction prefetching is a latency hiding technique that can bring instructions from the LLC into the L1-I. We propose a bouquet of instruction pointer (IP) jumpers, named JIP. JIP is a high-performance L1-I prefetcher that uses different prefetching techniques by classifying instructions into the following categories: (i) a non-branch, (ii) a branch that jumps to a single target IP on all instances, and (iii) a branch that jumps to different target IPs on different instances. Compared to a baseline with no instruction prefetching, averaged across 50 traces, JIP provides a prefetch coverage of 91.33% (as high as 99.99%), which leads to a performance improvement of 27.75% (as high as 93%). JIP makes a strong case for instruction prefetching as the performance gap between the perfect L1-I and JIP is just 7.49%. JIP demands a hardware overhead of 127.8 KB.