Many-core runtime

[HPCA 2021][PDF][Short presentation: Slides, MP4 video]
Mainak Chaudhuri. Zero Directory Eviction Victim: Unbounded Coherence Directory and Core Cache Isolation. In Proceedings of the 27th IEEE International Symposium on High-Performance Computer Architecture, pages 277-290, March 2021.

[ICCD 2017][PDF]
Sudhanshu Shukla and Mainak Chaudhuri. Sharing-aware Efficient Private Caching in Many-core Server Processors. In Proceedings of the 35th IEEE International Conference on Computer Design, pages 485-492, November 2017.

[HPCA 2017][PDF][Slides (PPTX)]
Sudhanshu Shukla and Mainak Chaudhuri. Tiny Directory: Efficient Shared Memory in Many-core Systems with Ultra-low-overhead Coherence Tracking. In Proceedings of the 23rd IEEE International Symposium on High Performance Computer Architecture, pages 205-216, February 2017.

[MEMOCODE 2016][PDF]
Prakhar Banga, Atul R Pai, Subhajit Roy, and Mainak Chaudhuri. Accelerating Schedule Space Exploration of Multi-threaded Programs with GPUs. In Proceedings of the 14th ACM/IEEE International Conference on Formal Methods and Models for System Design, pages 115-124, November 2016.

[ICCD 2015][PDF][Slides (PPTX)]
Sudhanshu Shukla and Mainak Chaudhuri. Pool Directory: Efficient Coherence Tracking with Dynamic Directory Allocation in Many-core Systems. In Proceedings of the 33rd IEEE International Conference on Computer Design, pages 557-564, October 2015.

Last-level SRAM cache

[ISCA 2021][PDF][Short presentation: PPTX slides, Video]
Mainak Chaudhuri. Zero Inclusion Victim: Isolating Core Caches from Inclusive Last-level Cache Evictions. In Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, pages 71-84, June 2021.

[ICCD 2019][PDF][Talk slides (PPTX)]
Mainak Chaudhuri, Jayesh Gaur, and Sreenivas Subramoney. Bandwidth-aware Last-level Caching: Efficiently Coordinating Off-chip Read and Write Bandwidth. In Proceedings of the 37th IEEE International Conference on Computer Design, pages 109-118, November 2019.

[MICRO 2013][PDF][Talk slides (PPTX)]
Jayesh Gaur, Raghuram Srinivasan, Sreenivas Subramoney, and Mainak Chaudhuri. Efficient Management of Last-level Caches in Graphics Processors for 3D Scene Rendering Workloads. In Proceedings of the 46th IEEE/ACM International Symposium on Microarchitecture, pages 395-407, December 2013.

[IISWC 2013][PDF][Talk slides (PPTX)]
Ragavendra Natarajan and Mainak Chaudhuri. Characterizing Multi-threaded Applications for Designing Sharing-aware Last-level Cache Replacement Policies. In Proceedings of the IEEE International Symposium on Workload Characterization, pages 1-10, September 2013.

[PACT 2012][PDF][Tuning suggestions][Talk slides (PPTX)]
Mainak Chaudhuri, Jayesh Gaur, Nithiyanandan Bashyam, Sreenivas Subramoney, and Joseph Nuzman. Introducing Hierarchy-awareness in Replacement and Bypass Algorithms for Last-level Caches. In Proceedings of the 21st IEEE/ACM International Conference on Parallel Architectures and Compilation Techniques, pages 293-304, September 2012.

[ISCA 2011][PDF][Talk slides (PPTX)][US patent 8,667,222][Follow-up work on server workloads: US patent 9,195,606]
Jayesh Gaur, Mainak Chaudhuri, and Sreenivas Subramoney. Bypass and Insertion Algorithms for Exclusive Last-level Caches. In Proceedings of the 38th IEEE/ACM International Symposium on Computer Architecture, pages 81-92, June 2011.

[MICRO 2009][PDF][Extended results][peLIFO simulation code][peLIFOLite simulation code][Talk slides (PPTX)]
Mainak Chaudhuri. Pseudo-LIFO: The Foundation of a New Family of Replacement Policies for Last-level Caches. In Proceedings of the 42nd IEEE/ACM International Symposium on Microarchitecture, pages 401-412, December 2009.

[HPCA 2009][PDF][Note][Talk slides (PPT)]
Mainak Chaudhuri. PageNUCA: Selected Policies for Page-grain Locality Management in Large Shared Chip-multiprocessor Caches. In Proceedings of the 15th IEEE International Symposium on High-Performance Computer Architecture, pages 227-238, February 2009.

[MICRO 2007][PDF][Talk slides (PPT)]
Arkaprava Basu, Nevin Kırman, Meyrem Kırman, Mainak Chaudhuri, and José F. Martínez. Scavenger: A New Last Level Cache Architecture with Global Block Priority. In Proceedings of the 40th IEEE/ACM International Symposium on Microarchitecture, pages 421-432, December 2007.

[ICCD 2007][PDF][Talk slides (PPT)]
Jugash Chandarlapati and Mainak Chaudhuri. LEMap: Controlling Leakage in Large Chip-multiprocessor Caches via Profile-guided Virtual Address Translation. In Proceedings of the 25th IEEE International Conference on Computer Design, pages 423-430, October 2007.

Speculation/Prediction in microprocessors

[IEEE ESL 2020][PDF]
Moumita Das, Ansuman Banerjee, Mainak Chaudhuri, Bhaskar Sardar. Shared Pattern History Tables in Multi-component Branch Predictors with a Dealiasing Cache. In IEEE Embedded Systems Letters, 12(3):95-98, September 2020.

[DPC 2019][PDF][Source code]
Mainak Chaudhuri and Nayan Deshmukh. Sangam: A Multi-component Core Cache Prefetcher. In Third Data Prefetching Championship Workshop, held in conjunction with 46th IEEE/ACM International Symposium on Computer Architecture (ISCA), June 2019.

[CVP 2018][PDF]
[Second place winner in unlimited storage track]
Nayan Deshmukh, Snehil Verma, Prakhar Agrawal, Biswabandan Panda, and Mainak Chaudhuri. DFCM++: Augmenting DFCM with Early Update and Data Dependence-driven Value Estimation. In First Championship Value Prediction Workshop, held in conjunction with 45th IEEE/ACM International Symposium on Computer Architecture (ISCA), June 2018. [Code, presentation, and more information]

[HPCA 2005][PDF]
[Winner of the best paper award]
Nevin Kırman, Meyrem Kırman, Mainak Chaudhuri, and José F. Martínez. Checkpointed Early Load Retirement. In Proceedings of the 11th IEEE International Symposium on High-Performance Computer Architecture, pages 16-27, February 2005.

DRAM cache

[ACM TACO 2017][PDF][Long version][US patent 10,013,352]
Mainak Chaudhuri, Mukesh Agrawal, Jayesh Gaur, and Sreenivas Subramoney. Micro-sector Cache: Improving Space Utilization in Sectored DRAM Caches. In ACM Transactions on Architecture and Code Optimization, 14(1), article no. 7, April 2017.

[HPCA 2017][PDF]
[Winner of the best paper award][Award certificate]
Jayesh Gaur, Mainak Chaudhuri, Pradeep Ramachandran, and Sreenivas Subramoney. Near-optimal Access Partitioning for Memory Hierarchies with Multiple Heterogeneous Bandwidth Sources. In Proceedings of the 23rd IEEE International Symposium on High Performance Computer Architecture, pages 13-24, February 2017.

CPU-GPU heterogeneous multi-cores

[CASES 2017 | ACM TECS 2017][PDF]
Siddharth Rai and Mainak Chaudhuri. Using Criticality of GPU Accesses in Memory Management for CPU-GPU Heterogeneous Multi-core Processors. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, October 2017. Published as ACM Transactions on Embedded Computing Systems, 16(5s), article no. 133 (ESWEEK 2017 special issue).

[HCW 2017][PDF]
Siddharth Rai and Mainak Chaudhuri. Improving CPU Performance through Dynamic GPU Access Throttling in CPU-GPU Heterogeneous Processors. In Proceedings of the 26th IEEE International Heterogeneity in Computing Workshop, pages 18-29, May 2017.

[ICS 2016][PDF][Slides (PPTX)]
Siddharth Rai and Mainak Chaudhuri. Exploiting Dynamic Reuse Probability to Manage Shared Last-level Caches in CPU-GPU Heterogeneous Processors. In Proceedings of the 30th ACM International Conference on Supercomputing, article no. 3, June 2016.

Parallel programming/Run-time supports

[ISEC 2016][PDF]
Anirban Ghose, Soumyajit Dey, Pabitra Mitra, and Mainak Chaudhuri. Divergence-aware Automated Partitioning of OpenCL Workloads. In Proceedings of the 9th ACM India Software Engineering Conference, pages 131-135, February 2016.

[ICPADS 2012][PDF][Source code][Talk slides (PPTX)]
Prabhakar Misra and Mainak Chaudhuri. Performance Evaluation of Concurrent Lock-free Data Structures on GPUs. In Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, pages 53-60, December 2012.

[PACT 2010 poster][PDF]
Santhosh S. Ananthramu, Deepak Majeti, Sanjeev K. Aggarwal, and Mainak Chaudhuri. Improving Speculative Loop Parallelization via Selective Squash and Speculation Reuse. In Proceedings of the 19th IEEE/ACM International Conference on Parallel Architectures and Compilation Techniques, pages 543-544, September 2010.

[HPC ASIA 2009][PDF][Talk slides (PDF)]
Pramod K. Bhatotia, Sanjeev K. Aggarwal, and Mainak Chaudhuri. A Compilation Framework for Irregular Memory Accesses on the Cell Broadband Engine. In Proceedings of the 10th Asia Pacific High-Performance Computing Conference, pages 62-69, March 2009.

[SciProg 2009][PDF]
Vishwas B. C., Abhishek Gadia, and Mainak Chaudhuri. Implementing a Parallel Matrix Factorization Library on the Cell Broadband Engine. In Scientific Programming special issue on high-performance computing with Cell BE, 17(1-2): 3-29, February 2009.

Directory controller microarchitecture

[IEEE TPDS 2007][PDF][Talk slides (PPT)]
Mainak Chaudhuri and Mark Heinrich. Integrated Memory Controllers with Parallel Coherence Streams. In IEEE Transactions on Parallel and Distributed Systems, 18(8): 1159-1173, August 2007.

[ISCA 2004][PDF][Talk slides (PPT)]
Mainak Chaudhuri and Mark Heinrich. SMTp: An Architecture for Next-generation Scalable Multi-threading. In Proceedings of the 31st IEEE/ACM Annual International Symposium on Computer Architecture, pages 124-135, June 2004.

[IEEE TC 2003][PDF]
Mainak Chaudhuri, Mark Heinrich, Chris Holt, Jaswinder Pal Singh, Edward Rothberg, and John Hennessy. Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation. In IEEE Transactions on Computers, 52(7): 862-880, July 2003.

Directory protocol optimization

[IEEE TPDS 2004][PDF]
Mainak Chaudhuri and Mark Heinrich. Exploring Virtual Network Selection Algorithms in DSM Cache Coherence Protocols. In IEEE Transactions on Parallel and Distributed Systems, 15(8): 699-712, August 2004.

[IEEE TPDS 2004][PDF][Talk slides (PDF)]
Mainak Chaudhuri and Mark Heinrich. The Impact of Negative Acknowledgments in Shared Memory Scientific Applications. In IEEE Transactions on Parallel and Distributed Systems, 15(2): 134-150, February 2004.

Intelligent memory controller

[ICPP 2007][PDF][Talk slides (PPT)]
Lakshmana R. Vittanala and Mainak Chaudhuri. Integrating Memory Compression and Decompression with Coherence Protocols in Distributed Shared Memory Multiprocessors. In the 36th IEEE International Conference on Parallel Processing, September 2007.

[ISPASS 2007][PDF][Talk slides (PPT)]
Dhiraj D. Kalamkar, Mainak Chaudhuri, and Mark Heinrich. Simplifying Active Memory Clusters by Leveraging Directory Protocol Threads. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, pages 242-253, April 2007.

[IEEE TC 2004][PDF]
Daehyun Kim, Mainak Chaudhuri, Mark Heinrich, and Evan Speight. Architectural Support for Uniprocessor and Multiprocessor Active Memory Systems. In IEEE Transactions on Computers, 53(3): 288-307, March 2004.

[IPDPS 2003][PDF]
Daehyun Kim, Mainak Chaudhuri, and Mark Heinrich. Active Memory Techniques for ccNUMA Multiprocessors. In Proceedings of the 2003 IEEE International Parallel and Distributed Processing Symposium, April 2003. (Abstract on page 10)

[PDPTA 2002][PDF]
Mainak Chaudhuri, Daehyun Kim, and Mark Heinrich. Cache Coherence Protocol Design for Active Memory Systems. In Proceedings of the 2002 International Conference on Parallel and Distributed Processing Techniques and Applications, pages 83-89, June 2002.

[ICS 2002][PDF]
Daehyun Kim, Mainak Chaudhuri, and Mark Heinrich. Leveraging Cache Coherence in Active Memory Systems. In Proceedings of the 16th ACM International Conference on Supercomputing, pages 2-13, June 2002.

[ISHPC 2002][PDF]
Mark Heinrich, Evan Speight, and Mainak Chaudhuri. Active Memory Clusters: Efficient Multiprocessing on Commodity Clusters. In Proceedings of the 4th International Symposium on High Performance Computing (Lecture Notes in Computer Science, vol. 2327, pages 78-92, Springer Verlag), May 2002.


Brief notes

[ACM SIGARCH CAN 2006][PDF]
Abhas Kumar, Nisheet Jain, and Mainak Chaudhuri. Long-Latency Branches: How Much Do They Matter? In ACM SIGARCH Computer Architecture News, 34(3):9-15, June 2006. [Errata in PDF]

[ACM SIGARCH CAN 2003][PDF]
Mark Heinrich and Mainak Chaudhuri. Ocean Warning: Avoid Drowning. In ACM SIGARCH Computer Architecture News, 31(3):30-32, June 2003.