Lecture # 28

Prateek Gupta (Y0240)

We shall look at two distinct but related topics during the course of this discussion. The outline of the lecture is as follows:

- Modeling wide area traffic: Network traffic is often modeled as Poisson distribution for analytic simplicity, but we shall see that the packet inter-arrival times are not exponentially distributed. A detailed study of the issues in modeling wide area traffic would be listed here.
- TCP Evaluation: Various methods exist to evaluate the performance of the Internet’s Transmission Protocol (TCP) but all of them have pitfalls that need to be understood prior to obtaining results. Testing TCP is difficult for a variety of reasons which we shall study during this lecture and look at approaches to model TCP performance.

While modeling internet traffic, packet and connection arrival times it is a common practice to assume a Poisson distribution, since the Poisson distribution has several interesting theoretical properties. Let us first look at Poisson distribution before venturing any further

P(x) = exp(-lambda) * power((lambda),x)/x! (1)

where,

average=lambda and variance= lambda

x is discrete and lies in the open interval [0,infinity).

An important characteristic of the Poisson distribution is that the probability of x taking a discrete value is independent upon the previous values i.e. the probability is independent of the past. Poisson distribution is often used to model arrival of packets during an interval. The packet arrival times modeled by the Poisson distribution have an exponential distribution and constitute an iid (independent identically distributed) process. However, in practice it has been shown that the packet inter-arrival times do not have an exponential distribution, hence the error introduced by modeling them as Poisson distribution is significantly large. Studies have shown that user-initiated TCP session arrivals, such as remote-login and file-transfer, are well-modeled as Poisson processes with fixed hourly rates, but that other connection arrivals deviate considerably from Poisson; that modeling TELNET packet interarrivals as exponential grievously underestimates the burstiness of TELNET traffic.

Modeling wide area traffic includes modeling a number of parameters such as connection arrival, packet arrival, number of bytes transferred in the connection, etc. The traffic has been modeled and traces collected for a number of applications such as Telnet, FTP, SMTP, HTTP, SMTP, NNTP. However, for the purpose of this lecture we will mainly focus our attention on the TCP packet inter-arrival times and show that the packet inter-arrival time cannot be modeled as a constant rate TCP connection for the entire day.

In this section we will look at the connection start times for several TCP protocols. The pattern of connection arrivals is dominated by a 24-hour pattern, as has been widely observed before. For TELNET connection arrivals and for FTP session arrivals, within one-hour intervals the arrival process can be well-modeled by a homogeneous Poisson process; each of these arrivals reflects an individual user starting a new session. Over one hour intervals, no other protocol’s connection arrivals are well-modeled by a Poisson process. Even if we restrict ourselves to ten-minute intervals, only FTP session and TELNET connection arrivals are statistically consistent with Poisson arrivals, though the arrival of SMTP connections and of FTPDATA “bursts” (discussed later in _ 6) during

ten-minute intervals are not terribly far from what a Poisson process would generate. The arrivals of NNTP, FTPDATA, andWWW (World Wide Web) connections, on the other hand, are decidedly not Poisson processes.

The following figure shows the mean hourly connection arrival rate for datasets LBL-1 through LBL-4. For the different protocols, we plot for each hour the fraction of an entire day’s connections of that protocol occurring during that hour.

From the figure, it can shown that TELNET connection arrivals and FTP session arrivals are very well modeled as Poisson, both for 1-hour and 10-minute fixed rates. No other protocol’s arrivals are well modeled as Poisson with fixed hourly rates. If we require fixed rates only over 10-minute intervals, then SMTP and FTPDATA burst arrivals are not terribly far from Poisson, though neither is statistically consistent with Poisson arrivals, and consecutive SMTP interarrival times show consistent positive correlation. NNTP, FTPDATA, and WWWarrivals, on the other hand, are clearly not Poisson.

The NNTP and SMTP are not Poisson because of the flooding mechanism used to propagate network news, NNTP connections can immediately spawn secondary connections as new network news is received from one remote peer and in turn offered to another. NNTP and SMTP connections are also often timer-driven i.e. SMTP connections are affected by mailing list explosions because in a mailing list one connection immediately follows another.

The inter arrival times in a telnet connection are consistent with the empirical Tcplib distribution, unlike the exponential distribution. The distribution of telnet inter-arrivals is “heavy tailed” i.e. arger values exist with small but non-zero probability. Modeling TELNET packet arrivals by a Poisson process, as is generally done, can result in simulations and analyses that significantly underestimate performance measures such as average packet delay.

Figure 2: Empirical distribution of packet interarrivals within Telnet connections

Evaluation of TCP

In the second part of the lecture we will look at some of the TCP evaluation techniques. Understanding the performance of TCP is especially important because it is the dominant protocol in today’s internet. Evaluating TCP is difficult because of the range of environments, variables and evaluation techniques. The evaluation techniques can broadly be divided into two classes, implementation based and simulation based. The implementation based techniques have the advantage of having real time traffic but they generally difficult to model because of various reasons including the cost of the setup.

Let us have a look at some of the TCP features that would help us in evaluation before we proceed to the evaluation process.

(W bytes) required for a TCP to fully utilize the given amount of available bandwidth, B bytes/second, over a network with a round-trip time (RTT) of R seconds

W = B. R (2)

Therefore, a network path that exhibits a long delay and/or a large bandwidth may require a window size of more than 64 KB. Window scaling extensions to TCP have been defined that allow the use a window size of more than 64 KB. Window scaling can lead to more rapid use of the TCP sequence space. Therefore, along with window scaling the Protect Against Wrapped Sequence Numbers (PAWS) algorithm is required. In turn, the PAWS algorithm requires the timestamp option . The timestamp option adds 12 bytes to each segment. These additional header bytes are expected to be costly only to excessively low bandwidth channels. The timestamp option also allows TCP to easily take multiple RTT samples per round-trip time.

a given timeout (not to exceed 0.5 seconds) an ACK is transmitted.

Simulation based evaluation techniques have been widely employed for a large number of TCP evaluations. A large variety of simulators for modeling internetworking protocols exist, and are currently used by researchers. Some of these tools are OpNet, x-sim, the Network Simulator, REAL. The following are advantages of using simulator based techniques:

- Simulations do not generally employ high setup costs and also do not need expensive machinery.
- Simulations provide a means of testing TCP performance across rare situations which are not countered in day to day life.
- Complex topologies can be easily created via simulation
- Simulators provide access to data about all the traffic transmitted in the network
- Simulators give an easy way to test impact of changes
- Simulators are not limited by the physical limitations of the network

- Simulators generally use an abstract TCP implementation, rather than using implementations which are found in real scenarios (such as real operating systems).
- Simulators generally do not model non network events
- Simulators generally make some assumptions which are not valid in the real world.

While simulations can provide valuable insight into the performance of TCP, they are often not as illuminating as tests conducted with real TCP implementations over real networks. The implementation based evaluation techniques can be divided into the following categories: