Server processors are designed to deliver high throughput at low latency. As a result, these processors are usually equipped with a few tens of latency-optimized compute engines or processing cores. Meeting the instruction and data demand of these cores requires a deep cache hierarchy between the compute engines and the main memory. The last level of the on-die SRAM cache hierarchy and the memory-side DRAM cache have lately attracted significant attention from the researchers as well as industry practitioners. In this talk, I will discuss some of the important performance issues in the design of the on-die last-level SRAM cache and the memory-side DRAM cache and their solutions arising from our research. In particular, optimization of hit latency and miss count for the on-die last-level SRAM cache and efficient maintenance of coherence information across the on-die cache hierarchy will be discussed. For the memory-side DRAM cache, I will touch upon the fundamental trade-offs between hit rate and bandwidth optimization and discuss a few techniques to improve the bandwidth delivery in systems equipped with such caches. This talk will present a sampling of my research contributions of roughly one decade done in collaboration with my students at IITK and external collaborators primarily from Intel Microarchitecture Research Lab at Bangalore and Intel Architecture Group at Bangalore and Haifa.