Lecture # 28

by
Prateek Gupta (Y0240)

We shall look at two distinct but related topics during the course of this discussion. The outline of the lecture is as follows:
1. Modeling wide area traffic: Network traffic is often modeled as Poisson distribution for analytic simplicity, but we shall see that the packet inter-arrival times are not exponentially distributed. A detailed study of the issues in modeling wide area traffic would be listed here.
2. TCP Evaluation: Various methods exist to evaluate the performance of the Internet’s Transmission Protocol (TCP) but all of them have pitfalls that need to be understood prior to obtaining results. Testing TCP is difficult for a variety of reasons which we shall study during this lecture and look at approaches to model TCP performance.
Modeling Wide Area Traffic

While modeling internet traffic, packet and connection arrival times it is a common practice to assume a Poisson distribution, since the Poisson distribution has several interesting theoretical properties. Let us first look at Poisson distribution before venturing any further
P(x) = exp(-lambda) * power((lambda),x)/x!         (1)
where,
average=lambda and variance= lambda
x is discrete and lies in the open interval [0,infinity).

An important characteristic of the Poisson distribution is that the probability of x taking a discrete value is independent upon the previous values i.e. the probability is independent of the past. Poisson distribution is often used to model arrival of packets during an interval. The packet arrival times modeled by the Poisson distribution have an exponential distribution and constitute an iid (independent identically distributed) process. However, in practice it has been shown that the packet inter-arrival times do not have an exponential distribution, hence the error introduced by modeling them as Poisson distribution is significantly large. Studies have shown that user-initiated TCP session arrivals, such as remote-login and file-transfer, are well-modeled as Poisson processes with fixed hourly rates, but that other connection arrivals deviate considerably from Poisson; that modeling TELNET packet interarrivals as exponential grievously underestimates the burstiness of TELNET traffic.

Modeling wide area traffic includes modeling a number of parameters such as connection arrival, packet arrival, number of bytes transferred in the connection, etc. The traffic has been modeled and traces collected for a number of applications such as Telnet, FTP, SMTP, HTTP, SMTP, NNTP. However, for the purpose of this lecture we will mainly focus our attention on the TCP packet inter-arrival times and show that the packet inter-arrival time cannot be modeled as a constant rate TCP connection for the entire day.

TCP Packet Interarrival Time

In this section we will look at the connection start times for several TCP protocols. The pattern of connection arrivals is dominated by a 24-hour pattern, as has been widely observed before. For TELNET connection arrivals and for FTP session arrivals, within one-hour intervals the arrival process can be well-modeled by a homogeneous Poisson process; each of these arrivals reflects an individual user starting a new session. Over one hour intervals, no other protocol’s connection arrivals are well-modeled by a Poisson process. Even if we restrict ourselves to ten-minute intervals, only FTP session and TELNET connection arrivals are statistically consistent with Poisson arrivals, though the arrival of SMTP connections and of FTPDATA “bursts” (discussed later in _ 6) during
ten-minute intervals are not terribly far from what a Poisson process would generate. The arrivals of NNTP, FTPDATA, andWWW (World Wide Web) connections, on the other hand, are decidedly not Poisson processes.

The following figure shows the mean hourly connection arrival rate for datasets LBL-1 through LBL-4. For the different protocols, we plot for each hour the fraction of an entire day’s connections of that protocol occurring during that hour.

From the figure, it can shown that TELNET connection arrivals and FTP session arrivals are very well modeled as Poisson, both for 1-hour and 10-minute fixed rates. No other protocol’s arrivals are well modeled as Poisson with fixed hourly rates. If we require fixed rates only over 10-minute intervals, then SMTP and FTPDATA burst arrivals are not terribly far from Poisson, though neither is statistically consistent with Poisson arrivals, and consecutive SMTP interarrival times show consistent positive correlation. NNTP, FTPDATA, and WWWarrivals, on the other hand, are clearly not Poisson.

The NNTP and SMTP are not Poisson because of the flooding mechanism used to propagate network news, NNTP connections can immediately spawn secondary connections as new network news is received from one remote peer and in turn offered to another. NNTP and SMTP connections are also often timer-driven i.e. SMTP connections are affected by mailing list explosions because in a mailing list one connection immediately follows another.

The inter arrival times in a telnet connection are consistent with the empirical Tcplib distribution, unlike the exponential distribution. The distribution of telnet inter-arrivals is “heavy tailed” i.e. arger values exist with small but non-zero probability. Modeling TELNET packet arrivals by a Poisson process, as is generally done, can result in simulations and analyses that significantly underestimate performance measures such as average packet delay.

Figure 2: Empirical distribution of packet interarrivals within Telnet connections

Evaluation of TCP

In the second part of the lecture we will look at some of the TCP evaluation techniques. Understanding the performance of TCP is especially important because it is the dominant protocol in today’s internet. Evaluating TCP is difficult because of the range of environments, variables and evaluation techniques. The evaluation techniques can broadly be divided into two classes, implementation based and simulation based. The implementation based techniques have the advantage of having real time traffic but they generally difficult to model because of various reasons including the cost of the setup.
Let us have a look at some of the TCP features that would help us in evaluation before we proceed to the evaluation process.

TCP Features

Basic Congestion Control. TCP congestion control mechanisms basically include the following features: slow start, congestion avoidance, fast retransmit and fast recovery congestion control. Slow start and congestion avoidance are required by the IETF standards, while fast retransmit and fast recovery are recommended, mainly as performance enhancements.

Extensions for High Performance. The standard TCP header limits the advertised window size to 64 KB, which is not adequate in many situations. The following equation defines the minimum window size
(W bytes) required for a TCP to fully utilize the given amount of available bandwidth, B bytes/second, over a network with a round-trip time (RTT) of R seconds

W = B. R     (2)

Therefore, a network path that exhibits a long delay and/or a large bandwidth may require a window size of more than 64 KB. Window scaling extensions to TCP have been defined that allow the use a window size of more than 64 KB. Window scaling can lead to more rapid use of the TCP sequence space. Therefore, along with window scaling the Protect Against Wrapped Sequence Numbers (PAWS) algorithm is required. In turn, the PAWS algorithm requires the timestamp option . The timestamp option adds 12 bytes to each segment. These additional header bytes are expected to be costly only to excessively low bandwidth channels. The timestamp option also allows TCP to easily take multiple RTT samples per round-trip time.

Selective Acknowledgement. TCP uses a cumulative acknowledgment (ACK) that simply indicates the last in-order segment that has arrived. When a segment arrives out-of-order a duplicate ACK is transmitted. The selective acknowledgment (SACK) option allows the TCP receiver to inform the TCP sender of which segments have arrived and which segments have not. This allows the TCP sender to intelligently retransmit only those segments that have been lost.

Delayed Acknowledgments. Delayed acknowledgements allow TCP to refrain from sending an acknowledgment for each incoming data segment, but rather transmit an ACK for every second full-sized data segment received. If a second data segment is not received within
a given timeout (not to exceed 0.5 seconds) an ACK is transmitted.

Nagle Algorithm. The Nagle algorithm is used to combine many small bits of data produced by applications into larger TCP segments. The Nagle algorithm has been shown to reduce the number of segments transmitted into the network, but also interferes with the HTTP and NNTP protocols, as well as the delayed acknowledgment strategy, thus reducing performance.

Larger Initial Windows. Large window size allows TCP to start with an initial window size of 3-2 instead of 1-2 packets to enable fast restart. However, the feature has not been standardized and experiments should be conducted to show whether using a large window size if fruitful or not.

Explicit Congestion Notification. TCP interprets segment loss as indicating network congestion. However, Explicit Congestion Notification(ECN) is a method in which a router can send a TCP an explicit message stating that the network is becoming congested, rather than dropping a segment.

Simulation based studies

Simulation based evaluation techniques have been widely employed for a large number of TCP evaluations. A large variety of simulators for modeling internetworking protocols exist, and are currently used by researchers. Some of these tools are OpNet, x-sim, the Network Simulator, REAL. The following are advantages of using simulator based techniques:
• Simulations do not generally employ high setup costs and also do not need expensive machinery.
• Simulations provide a means of testing TCP performance across rare situations which are not countered in day to day life.
• Complex topologies can be easily created via simulation
• Simulators give an easy way to test impact of changes
•  Simulators are not limited by the physical limitations of the network
The disadvantages of using simulators are as follows:
• Simulators generally use an abstract TCP implementation, rather than using implementations which are found in real scenarios (such as real operating systems).
• Simulators generally do not model non network events
• Simulators generally make some assumptions which are not valid in the real world.

Implementation based Evaluation

While simulations can provide valuable insight into the performance of TCP, they are often not as illuminating as tests conducted with real TCP implementations over real networks. The implementation based evaluation techniques can be divided into the following categories:

Dedicated Testbeds : In a testbed, real TCP implementations are being tested over real networks. Testbeds can incorporate hard to simulate network changes such as satellite link. On the other hand, testbeds are generally limited in their capacity and speed by the network at hand.

Emulation : An emulator models a particular piece of the network path between two real hosts. Therefore, emulation is a mix between simulation and using a testbed. Whereas, emulation have several distinct advantages, they abstract some of the real behaviour of the network modeled.

Live Internet Tests : Another alternative is to run the tests directly over the internet rather than using a dedicated testbed or a simulator. The disadvantages of conducting live experiments over the Internet is the inability to assess the impact the sending TCP has on the other network traffic sharing the network path. Whereas, with simulators and testbeds it is fairly easy to monitor all traffic on the given network, it is difficult to obtain the same kind of monitoring of all the traffic competing with the TCP transfer a researcher generates when running over the Internet. In addition, assessing the impact of a new algorithm, or some other mechanism that is expected to be placed in the middle of the network is difficult to accomplish in tests conducted over the Internet because of the global nature of the internet .