Seminar by Virendra Singh

Energy and Throughput Efficient Fault Tolerance: A Microarchitectural Solution for Chip Multiprocessors

Virendra Singh
Supercomputer Education and Research Centre (SERC), Indian Institute of Science (IISc)

    Date:    Monday, February 28th, 2011
    Time:    5:00 PM
    Venue:   CS101.

Abstract:

Relentless scaling of silicon fabrication technology coupled with lower design tolerances are making ICs increasing susceptible to wear-out related permanent faults as well as transient faults (soft errors). A well known technique for tackling both transient and permanent faults is redundant execution, specifically space redundancy, wherein a program is executed redundantly on different processors, pipelines or functional units and the results are compared to detect faults.

This talk will describe a power-efficient architecture for redundant execution on chip multiprocessors (CMPs) which when coupled with our per-core dynamic voltage and frequency scaling (DVFS)algorithm significantly reduces the power overhead of redundant execution without sacrificing performance. Using cycle accurate simulation combined with an architectural power model we estimate that our architecture reduces dynamic power dissipation in the redundant core by an mean value of 76% with an associated mean performance penalty of only 1.2%. I also present an extension to our architecture that enables the use of cores with faulty functional units for redundant execution without a reduction in transient fault coverage. This extension enables the usage of faulty cores, thereby increasing yield and reliability with only a modest power-performance penalty over fault-free execution.

Our second architecture addresses the issue of throughput loss in fault-tolerant CMPs. This is done by using coarse-grained multithreading to multiplex multiple trailing threads on a single core. Our evaluation shows that this architecture delivers higher throughput than previous proposals, including one configuration that uses simultaneous multithreading (SMT) to multiplex trailing threads. This increase in throughput comes at a modest cost in single-thread performance. Finally, circuit and device level techniques will be discussed briefly to deal with such issues.

About the speaker:

Virendra Singh obtained Ph.D in Computer Science from Nara Institute of Science and Technology (NAIST), Nara, Japan in 2005. He received B.E and M.E in Electronics and Communication Engineering from Malaviya National Institute of Technology (MNIT), Jaipur, India. Currently, he is a faculty member at Supercomputer Education and Research Centre (SERC), Indian Institute of Science (IISc), Bangalore since May 2007. He served Central Electronics Engineering Research Institute (CEERI), Pilani, India as a Scientist for 10 years prior to joining IISc. He also served as a faculty at Department of Computer Science, Banasthali University from June 1996 to March 1997. His research interests are high performance computer architecture, testing and verification of high performance processors, fault tolerant computing, VLSI testing, design for test, formal verification, of hardware designs embedded system design, design for reliability, and CAD of VLSI Systems. He is a member of the IEEE, the ACM, the VSI, and life member of the IETE. He is a PC member of many conferences in the area of CAD and VLSI such as DATE, ETS, ATS, VLSI Design, IOLTS, ISVLSI. He is a co-founder of RASDAT (IEEE International Workshop on Reliability Aware System Design and Test), and IWPVTD (IEEE Intl. Workshop on Processor Verification, Test and Debug) workshops.

Back to Seminars in 2010-11