Course:       Advance Computer Networks (CS625), Fall-2003
Topic:        Overlay Network
Instructor:   Dr. Bhaskaran Raman
Scribe:       Amit Kumar Mondal(Y0037)(akmondal@cse)

What is Overlay Networks?

Resilient Overlay Networks (RON), an architecture that allows end-to-end communication across the wide-area Internet to detect and recover from path outages and periods of degraded performance within several seconds. A RON is an application-layer overlay on top of the existing Internet routing substrate. The overlay nodes monitor the liveliness and quality of the Internets paths among themselves, and they use this information to decide whether to route packets directly over the Internet or by way of the The RON nodes monitor the functioning and quality of the Internet paths among themselves, and use this information to decide whether to route packets directly over the Internet or by way of other RON nodes, optimizing application-specific routing metrics.

- build on the top of another network such as ATM etc.
- IP itself id build on the top of another network.
- The term usually means a network on the top of IP.

Motivation behind Overlay Networks:-

Internet suffers from following four important drawbacks:
1. slow link failure recovery : BGP takes a long time, of the order of several minutes, to converge to a new valid route after a router or link failure causes a path outage.
2. Inability to detect path or performance failure: BGP cannot detect many problems like floods, persistent congestion, etc. that can greatly affects the performance. As long as a link is deemed "live" i.e.. the BGP session is still alive, BGP's AS-path-based routing will continue to route packets down the fault path.
3. Inability to effectively multi-home end-customer networks: As "solution" to Internet unreliability (Instability) is to multi-homing. Unfortunately, peering at the network level by small customers break down wide-area routing scalability.
4. Blunt policy expression: BGP is unable of expressing fine-grained policies aimed at users or remote hosts; it can only express policies at the granularity of entire remote networks. This reduces the set of paths in the case of failures.

RON overcome this drawbacks of BGP.

A RON Model:-

  • Designate RON nodes for the overlay.
  • Exchange of performance and reach ability, and routing based on this.
  • 2-50 nodes (only) on overlay.

  • The RON architecture achieves the following benefits:
    1. Fault detection: A RON can more efficiently find alternate paths around problems even when the underlying network layer incorrectly believes that all is well.
    2. Better reliability for applications: Each RON can have an independent, application-specific definition of what constitutes a fault.
    3. Better performance: A RON's limited size allows it to use more aggressive path computation algorithms that the Internet. RON nodes can exchange more complete topologies, collect more detailed link quality metrics, execute more complex routing algorithms , and respond more quickly to change.
    4. Application-specific routing: Distributed applications can link with the RON library and choose, or even define, their own routing matrices.

  • Software modules at RON node look into the following
    - RON client
    - Routing
    - Data Forwarding
    - Bootstrap and Membership management
    - Link state based dissemination
    - Monitoring Virtual Links.
    - Path-Evaluation and Selection
  • Full mesh network among members.

  • Possible Usage Models:-

  • A specific application (like Video conferencing) construct and uses RON.
  • A network administrator construct an overlay.
  • Overlay ISP.

  • Failure Detection in RON:-
  • Uses UDP heartbeat packet

  • - Failure detection in Overlay is application specific. In multimedia conferencing 5% loss rate may bark the video whereas a FTP application can still work with lower throughput.
    - But one cannot reduce heart beat interval to a very small value. That will give rise to false alarm.
    - Also there is a trade off between overhead vs. detection time.

  • Latency

  • - RON expects reply of heart beat from which it calculates RTT.
    - RTTs are stable over of the order of 15 mints to 1 hr.
    - If spikes occur in the middle, then that will be smoothen out by EWMA.

  • Packet Loss Rate

  • - Simply use heart beat and from this measure loss loss rate.
    - if p1, p2 are the loss rate of link1 and link2 respectively, then the loss rate of the path using consisting og link1 and link2 is 1-(1-p1)(1-p2).

  • Bandwidth

  • - The bandwidth is relatively unstable.
    - Usually the vary at a factor of 2.
    - So, we shift to an alternative path iff bandwidth of the new path is 2 times greater than the current bandwidth otherwise there will be a lot of switching.
    - Use back to back technique to calculate the available b/w.
    - In TCP the receiving end will receive packets at the rate of the bottleneck bandwidth. But even in that there are two cases
                1. This is only applicable if the intermediate routers implement WFQ
                2. If the intermediate routers implement FIFO then b2b principle cannot be used.
    1. D G Andersen, Resilient Overlay Networks. MASSACHUSETTS INSTITUTE OF TECHNOLOGY, USA.