Seminar by Dr. Om P Damani

Design and Implementation of a Scalable and Fault-tolerant Web Server Cluster

Dr. Om P Damani
Research Staff Member
IBM TJ Watson Research Lab
Yorktown Heights, NY, USA
Date: Wed, Sep 08, 2004
Time: 3:00 PM
Venue: CS-101

Abstract

Clustering of servers has become a standard approach to achieve scalability and high availability for Web services. A basic problem in server-cluster design is the dynamic assignment of incoming client requests to servers, based on the load, the client IP, and the requested object. Any solution to this problem should also have properties of load-balancing and fault-tolerance. One approach to this problem is to modify the network protocol layer so as to support a single-IP image for a cluster of machines. In the first part of the talk, I will sketch the ONE-IP system, a prototype based on this approach that I developed at Bell Labs. While the ONE-IP approach has low overhead and is easy to implement, it may lead to high cache-misses under heavy load. The bulk of my presentation will focus on a different design of the web-server cluster that we developed at Akamai Technologies. My design focus has been on the issues of load-balancing, fault-tolerance, and real-time debugging. The cluster architecture is divided into three subsystems for operations, monitoring, and control. While the web-servers operate independently of each other, the monitoring and control sub-system achieve fault-tolerance via replication, group-communication, and leader-election. Even if monitoring and control subsystem were to completely fail, operations would continue, albeit in a sub-optimal way. I will discuss the system architecture and the underlying load-balancing algorithm.

I will also briefly discuss the fault-tolerance issues for stateful server clustering in the context of IBM SMILE Middleware system.

About the Speaker

Dr. Om P Damani did his BTech from IIT Kanpur in CSE in 1994 and PhD from UT Austin in 1999. He was a research scientist at Akamai Technologies from 1999 to 2003. From 2003 onwards he has been working with IBM TJ Watson Research Lab as Research Staff member.

Back to Seminars in 2004-05