

# CS698Y: Modern Memory Systems Lecture-6 (Caches)

# **Biswabandan Panda**

biswap@cse.iitk.ac.in

https://www.cse.iitk.ac.in/users/biswap/CS698Y.html

## **Flow of the Module**

#### **Cache Management Policies**

**Cache Hierarchies** 

**Hardware Prefetching** 

**Cache Compression** 

#### **Non-uniform Caches**

## **Caches in Single-core System**



## **Caches in Multi-core**



## **Latency Numbers**



# Our Goal: To minimize off-chip DRAM accesses

# Cache Replacement (LRU) - 101



# Common Access Patterns [RRIP, ISCA 10]

Recency friendly 
$$(a_1, a_2, ..., a_k, a_{k-1}, ..., a_2, a_1)^N$$

[k > cache size]

Streaming  $(a_1, a_2, \dots, a_\infty)^N$ 

#### **Combination of above three**

# Types of Workloads (Baseline 4MB Cache)



#### **Limitations of LRU**

LRU exploits temporal locality

Streaming data (a₁, a₂, a₃,....a∞): No temporal locality, No temporal reuse

Thrashing data (a<sub>1</sub>, a<sub>2</sub>, a<sub>3</sub>,...,a<sub>n</sub>) [n>c] Temporal locality exists. However, LRU fails to capture.

# **Bimodal Insertion Policy (BIP) [ISCA '07]**

```
if (rand() < ε) ε=1/16,1/32,1/64
    Insert at MRU position;
else
    Insert at LRU position;</pre>
```

For small ε: BIP retains thrashing protection of LRU insertion policy.

Infrequently insert lines in MRU position

# **Dynamic Insertion Policy (DIP) [ISCA '07]**

SDM – Set Dueling monitors PSEL – n-bit saturating counters for deciding a policy



# What about DIP for shared Caches?



#### What about the learning process for 2-core? N-core? BIP or LRU?

# **DIP for Shared Caches [PACT '08]**



# Thread-Aware DIP (TA-DIP) [PACT '08]



Modern Memory Systems

## **DIP vs TA-DIP**



# **Still Miles to Go**



# **Still Miles to Go**



Recurring *scans (bursts of non-temporal data)* → Preserve frequently referenced working set in the cache



# **Still Miles to Go**



# What About NRU?



# NRU to RRIP [ISCA '10]



**RRP: Re-reference prediction** 

#### RRIP



Static RRIP (Single core) and Thread-Aware Dynamic RRIP (SRRIP+BRRIP, multi-core, based on SDMs).

#### RRIP



# **SRRIP – Not Good Enough**





#### **Mixed Access Patterns**



(a1, a2), b1, b2, b3, b4, (a1, a2)

**One Reuse** 

# SHiP [MICRO '11]



## Signatures -> Re-reference [SHiP]

Memory Region OR Memory Instruction Program Counter (PC)

LLC accesses by the same "signature" tend to have similar re-reference patterns

LLC accesses by the same "signature" tend to have similar re-reference patterns

# **Examples**



#### SHiP

else



#### (b) SHiP Algorithm

```
if hit then
         cache line.outcome = true;
         Increment SHCT[signature_m];
         if evicted cache line.outcome != true
                  Decrement SHCT[signature_m];
         cache line.outcome = false;
         cache_line.signature_m = signature;
         if SHCT[signature] == 0
              Predict distant re-reference;
         else
              Predict intermediate re-reference:
end if
```