Networks Chapter 2

CHAPTER
TCP/IP Fundamentals
CHAPTER OBJECTIVES
After completing this chapter, the reader should be able to:
o Gain an understanding of the basic services provided by TCP, UDP, and Ip

o Explain the congestion control algorithms employed by TCp
o Describe protocol details of TCP necessary to ensure reliable data transfer

over unreliabie networks
Most of the performance issues in TCP/IP networks arise from various interactions
between the TCP engine and the surrounding communication environment. To
understand these performance issues and the techniques to address them, the
reader must be familiar with some of the basic details of TCP/IP protocols. This
chapter reviews the TCPiIP protocol fundamentals necessary for understanding
the subsequent chapters in the book. Many details, not directly referenced in the
rest of the book, are deliberately left out. For more comprehensive coverage of
TCP/P, readers should consuit books dedicated to TCP/IP protocols, such as that
by Comer [113].
2.1 TCP
TCP is a very complex protocol. To understand the performance dynamics of TCp,

one has to learn its basic operations. In this section, we explain some of the key
features of rcP, including the flow control and congestion control.
2.1.1 TCP Services
TCP provides several useful services to its applications. These services are briefly
described in this section.
Connection'Oriented Service. TCP is a connection-oriented protocol. Before

two application processes can start sending data to each other, they must establish
a TCP connection between them. If multiple application processes are running on a
given IP host, each process is identified by a unique port number in that host, so each
of them can establish a separate TCP connection. Each TCP connection is identified
by a 4-tuple, source IP address, source TCP port number, destination IP add;ess, and
destination TCP port number. The connection is terminated upon completion of the
communication session. Connection establishments and terminations are explained
later in the section.
34
Section 2.'l TCP 35
Streaming Service. TCP provides a streamlng service to its applications.
Once a TCP connection is established between two application processes (one is
a sending process, the other is a receiving process), the sender writes a stream of
bvtes (or characters) into the connection and the receiver reads these bytes out of
the connection. The stream-oriented abstraction is visible only to the applications;
the TCP layer itself operates on a packet mode. The sending TCP accumulates a
certain amount of application bytes, forms a packet called a TCP segment, and sends
the segment to the receiving TCP. The receiving TCP extracts application bytes
ttom the segment, orders them if necessary, and delivers them as a stream of bytes
to the appropriate receiving application process. The format of a TCP segment is
explained later in the section.
Futl-Duplex Service. TCP is a full-duplex prolocol supporting data flow in

both directions. This means that once a TCP connection has been established
between two application processes, either process can send data to the other over
the same connection at the same time.
Reliable Service. TCP guarantees delivery of every single byte, in order,

without any duplication. To achieve reordering of any out-of-order arrival and to
eliminate any duplicate delivery, the receiving TCP buffers the incoming data before
delivering them to the application process. To guarantee the delivery of data, TCP
uses the a*knowledgment mechanism to check if the transmitted data have been
received correctly by the receiver. Details of the acknowledgment procedure are
described in a later section. Unacknowiedged data are retransmitted later. If the
underlying communication channel is noisy and error-prone, several retransmissions
of the same segment may be necessary for correct delivery of data to the receiver.
For most data applications, such as file transfer and the World Wide Web, the
reliability feature is extremely important as the applications do not have to worry
about the lost or disordered data.
End-to-End Semantic. TCP's reliability is based on an end-to-end seman-

tic. Acknowledgments (ACKs) are generated only by the receiving TCP and
only after the data are received correctly by the receiver. Therefore, when a
TCP sender receives an ACK, it is guaranteed that the data have reached the
receiver safely. It is this end-to-end semantic that provides the ultimate reli-
ability at the TCP layer. The end-to-end semantic would tre violated if any
intermediate node (not the TCP destination) generates ACKs on behalf of the
destination.
2.1.2 Header Format

Each TCP segment hzrs two parts, a standard 20-byte header followe<t by a variable
payload containing the appiication data. The header contains much useful informa-
tion, such as the advertised window size, ACK number, and so on. To understand
TCP operations. it is necessary to examine the meanings and purposes of these
fieids. In this section. we describe the header format (Figure 2.1), the heids in the
header. rrnd their meanings.
35 Chapter 2 TCP/IP Fundamentals
31
Source port # Destination port #

Sequence number
Acknowledsment number
Header
iength Unusedlt,llAiPlRlSlF Receiver window slze
Checksum tlrgent pointer data
Options (variable)
Application data (variable length)
FIGURE 2.1 : TCP segment forrnat.
Source port number (16 bits). Each TCP appiication at the source host is uniquely
identifled by the source port number. The port identification allows rnultiplex-
ing and demultipiexing multiple TCP connections over the same TCP protocol
process.
Destination poil number (16 bits). it identifies a TCP application at the destination
host. When a TCP segment is received at the destination host, this port number
is used to deliver the segment data to the correct application.
Sequence number (32 bits). The 32-bit sequence number fleld contains the sequence
number of the first byte of data carried in the TCP segment. As an example, if
the preceding segment started with a sequence number of 2001 and contained
1460 bytes of data, then the sequence number of the next TCP segment is set
ro 346L.
Acknowledgment number (32 bits). The destination uses this fleld to acknowledge
the correctly received data.
Header length (4 bits). This fieid is used to indicate the length of TCP header in
multiples of 32-bit words. In most cases the header length of a TCP segment
is 20 bytes: however, this may vary if the options field is used. Because the
header can be of variable length, the length field also helps to identify the start
of the payload.
Reserved (6 bits). These six bits are reserved for future or experimental use.
Flags (6 hits). A TCP segment mav carry several different tvpes of protocol mes-
sages, such as ACK, start signal of a connection. end signal of a connection,
5ection 2.1 TCP 37
TABLE 2.1: TCP flags.
Flag Description
ACK (A) Acknowledgment fleld valid
FrN (F) Final segment from sender
PSH (P) Push operation invoked. Receiving process needs notification
RSr (R) Connection to be reset
sYN (S) Starl of a new connection
URG (U) Urgent pointer fleld valid
and so on. Each bit in the flag field is used to identify a given type. Table 2.1
shows the purpose of each of the six flag bits. The multiple flag bits may be set
at the same time. For exarnple. if an end signal is carried along with an ACK.
both ACK and final (FIN) flags must be set in that segment.
Receiver window size (16 bits). The receiver advertises its window (available buffer
space) to the sender using this fleld. The receiver window is used by the sender
for the purposes of flow control.
Checksum (16 bits). The chc'cksum fleld is computecl over the TCP header, the TCP
payload. and the pseudoheader consisting of the source and destination IP
addresses as rvell as the length field of the IP header. The checksum field
protects the header and the payload of the TCP segment.
Urgent pointer (L6 bits). A TCP segment may carry data that need priority treat-
ment (the urgent [URG] flag would be set for this segment). For exampie, an
URG pointer mav be used to pass escape characters to cancei an operation on
a remote computer. The URG data is processed before any other data waiting
in the buffer. The 16-bit URG pointer points to the last byte of URG data
in the segment, so that the receiving TCP can easily locate the URG data for
immediate processing.
Options (variatrle). Options are to be specified using muitiples of bytes. There are
two extra bytes preceding each option. The first byte indicates the option
type followed by the second b-vte indicating the length of the option in bytes
(including these trvo preceding frltes). Examples of options are:
o Nlaximum Segment Size (MSS) (16 bits). This optron is used b.v the
originating TCF during connection establishment (in the start-of-a-new-
connection [SYN] segment) to negotiate the MSS to be used for the
connectlon. The 16 bits used for this held limit the MSS to 6,1K8.
o Timestamp (8 bytes). Thc timestamp option is to be used for more
accurate round-trip tirne (RTT) calculations. Two four-bvte timestamp
flelds are used for this option. The sending TCP f]lls the first held with the
current time. The receiver echoes back the timestamp vaiue received in
the seconcl lie1cl in air ACK se-qment. This facilitates the sender for more
itccurate ciilculation of the RTT.
TCP
h;;;' i rcP PaYload TCF segment
IP
IP PaYload
heaaer ] IP datagram
FTGURE 2.2: Encapsuration of rcp segments into Ip datagrams.
2.1.3 Encapsulation in lp
Once a TCP segment is ready for transmission, it is passed on to the IP layer. The Ip
laver encapsulates the entire TCP segment, the TCP header, and the TCp payload
into the IP datagram pairload. Figure 2.2 illustrates the encapsulation of a TCp
segment in an IP datagram" Given this encapsulation rnethod, the first 20 bytes of an
IP datagram payload contain all fields of a standard (no options used) TCi header.
2.1.4 Acknowledgrnent Mechanism

TCP relies on acknowledgments from the receiver to confirm correct delivery of
data. Some of the important features of TCP's ACK mechanism are described L,eiow.
Cumuiative Acknowledgment. Each ACK is a confirmation that all bytes up

to the ACK number has been received correctly. For exampie, if the destination
sends an ACK of 2001. it means that all trytes up to and including 2000 have been
received. (Jne obvious benefit of such cumulative ACK is that many lost ACKs are
easily compensated for by the subsequent ACKs of higher numbers.
ACK-Only Se^gment and Piggybacking. The ACK is indicated through an

ACK fleld in the TCP header. Therefore, to acknowledge correctly received 6ytes,
a receiver can either create an ACK-only segment (the segment carries only the
header containing the ACK number. no data are sent in this segment), or it
can send
the ACK in a data segment (segment carrying data in the reverse direction). When
an ACK traveis in a data segnlent, the process is called piggybacking. piggybacking
reduces ACK traffic in the reverse direction.
Delayed ACK. The receiving TCP has the choice of either generating an
ACK as soon as it receives a segment or delaying the ACK for a whilJ. Ry deliying
the ACK, the receiver may be able to acknowledge two segments at a time
ald reduce
ACK traffic; however, delaying an ACK for too long may &ur" * timeout and retrans-
mission at the sender. A TCP ieceiver should not delay ACKs more than 500
rns.
Duplicate ACK. If a segment gets lost in the network. but the following
segment arrives safely at the receiver, it is possible for a receiving TCp to receive
data with a sequence number beyoncl the expected range. In that cise. the receiving
Section 2.1 TCP 39
TCP buffers the incorning bytes and regenerates the ACK for the bytes received so
far in sequence. The regeneration of the same ACK number causes the dupiicate
ACK phenomenon at the sender, that is, the sender can receive the same ACK more
than once. In the originai TCP, the sender simply ignores the duplicate ACK. As we
wili see in a later chapter, some later variants of TCP take special actions based on
duplicate ACKs.
2.1.5 Retransmission Mechanism

Retransmission is the basic tenet of TCP's reliable data transfer service. If a segment
is lost, it has to be retransmitted. To cletect the loss of a segment, TCP maintains a
retransmission timer for each segment sent. The timer is set for a duration called the
retransmission timeout (RTO) period. If an ACK is received during the RTO. the
timer is cleared; otherwise the timer expires. On expiration of the retransmission
timer, the segment is retransmitted.
Setting an optimum value for the RTO is verv signiflcant from the performance
point of view. 'fhe timeout period should be greater than the round-trip time (RTT)
to accommodate various delays, such as the transmission delay, the link propagation
delay. the header processing time. the ACK generation time, and so on. [n a
dynamic environment. however, the actual RTT may vary over time. On the one
hand. setting the RTO longer than necessary would result in longer delay for
applications if losses are frequent" On the other hand. smaller values may result
in premature retransmissions causing waste of communication resources such as
bandwidth and processing time.
To address this problem, the TCP sender maintains an estimate of RTT
for each of its connections. Let us use the variable EstimatedRTT to represent
this estimate. EstimatedRTT is calculated from a sample RTT (SampleRTT) of
the connection. where SampleRTT is defined as the time from the moment a TCP
segment is transmitted until an ACK is received for the segment. Because SampleRTT
usually varies between measurements (the variation is usualiy caused by the variable
queuing deiays in intermediate routers)^ an exponential weighted moving average
is used to calculate the Esti.matedRTT:
EstimatedRtT : (i -a) xEstimatedRTT+fl * SampleRTT (2 l)

A typical value used for tr is 0.12-5, which has the impact of gir ing a low weight to the
SampleRTT value measured in the previous period and a high weight to the historical
ciata representecl by EstimatedRTT. l,orver a value avoids an RTT estimate being
skewed by any spikes in the measured samples.
Once RTT is known. the RTO is estirnated as:
RTO : EstimatedRTT + 4 * del'i-ation (22)
where
deviation: (1 -o) *Deviati,on*a * lSampleRTT - EstimatedRTTl (2'3)
The deviation factor in Equation (2.2) accommodates any fluctuations in SampleRTT

from EstinatedRTT. For links with consistent SampleRTT, this factor will be qegligi-
ble. Equation (2.3) is used to maintain the exponentially weighted moving average
of the deviation.
40 Chapter 2 TCP^P Fundamentals ,
Most TCP implementations represent the RTO as a multiple of clock "ticks."

The retransmission timer is then decremented every clock tick. The timer expires
' when the value reaches zero. A retransmission timer should be set to at least two
ticks. In many popular implementations a tick equals 500 milliseconds yielding a
minimum RTO of 1 second. Recent operating systems, such as Solaris, have smalier
tick values.
2.'1.6 Connection Establishment and Termination

TCP provides the connection-oriented service through two procedures, connection
establishment and connection termination. A connection is established before
starting the data transfer. When the data transfer completes, the connection is
explicitly terminated. In this section we show the steps involved in connection
establishment and termination.
Connection Bstablishment. The process of establishing a TCP connection is

called three'way hcntlshakfug. Figure 2.3 illustrates the steps of three-way handshak-
ing using a client-server model (e.g., u web client tries to establish a TCP connection
with a web server to download a flie):
1. The client sends a SYN segment (SYN-bit set in the header) to the server with
an initial sequence number (e.g., SeqNo : 88) that it is going to use for this
connection.
2. The server sends a segment that has both SYN and ACK bits set in the
flag (SYN + ACK, AckNo : 89, SeqNo : 155). The ACK number (AckNo)
indicates that the server has received bytes up to 88 correctly and the next
byte it expects has sequence number 89. The sequence number (SeqNo)
tells the client that the server will use 155 as the starting sequence number
for its data. The client and the server may use different initial sequence
numbers.
SY\ SeqNo - 88
SYN, ACK, SeqNo : 155

AckNo = 89
ACK.AckNo : 156
Client Server
FIGURE 2.3: TCP connection establishment using three-way handshaking.

Section 2.1 TCP 41
3. Finaily, the client acknowledges the server's sequence number (AckNo : 156)
with an ACK segment. Now the client and the server have successfully estab-
lished a TCP connection between them and are ready to exchange data over
this connection.
Connection Termination. The process of terminating a TCP connection

is called four-way handshaking (Figure 2.4). The steps of four-way handshak-
ing are:
L. The client sends a FIN segment (FIN-bit set in the header) to the server to
indicate that it wishes to terminate the connection.
2. The server sends an ACK to confirm the receipt of the FIN segment. At this
stage the TCP client stops communication in the client-server direction. The
server, however, may need to continue the communication in the server-client
direction (e.g., part of a file is yet to be transmitted).
3. When the server is ready to close the connection, it sends a FIN segment to the
client. Because the server is not necessarily ready to terminate the server-client
communication when it receives a FIN segment from the client, steps 2 and 3
may not be combined.
4. The client acknowledges the receipt of the FIN segment r.vith an ACK segment.
Nov,'the connection is terminated from both ends.
Each handshake introduces some delays (the SYN or FIN segments need
to travel to the other ends). The handshaking is the major source of delay in
establishing and terminating TCP connections fcr long-distance communications
(e.g., in satellite TCP/IP networks).
Client Serl'er
FIG URE 2.4: TCP connection termination using four-wav handshaking.

42 Chapter 2 TCPIIP Fundamentals
2.1.7 Flow Control and Sliding Window

Flow control is the mechanism that prevents a fast sender from swamping a slow
receiver. Each TCP receiver allocates some buffer for a TCP connection. Data
received (correctly and in order) are placed in this buffer for the corresponding
application to read them and clear the buffer as soon as possible. However, in many
situations (e.g., a slow laptop downloading a file from a high-speed server on the
LAN), the application in the receiving host may not be able to keep up with the
data-arriving rate, leading to buffer overflow at the receiver. In such situations, TCP
exercises flow control to adjust the transmission rate of the sending TCP to prevent
buffer overflow at the receiver.
Sliding Window. TCP implements a sliding window scheme to .rccomplish

the flow control. The size of the window controls the number of bytes in transit
(transmitted but not yet acknowledged). When a window full of data is in transit,
TCP must stop transmitting any further segments and wait for acknowiedgment
from the receiver. When an acknowledgment arrives, TCP can transmit new bytes
not exceeding the humber of bytes acknowledged.
The sliding window concept is illustrated in Figure 2.5.ln this example, the
window size is six bytes (TCP's flow control is byte-based), which will allow a
maximum of six bytes to be in transit. We have the following steps:
r Step 1. Bytes 0. 1, and 2 have already been transmitted and acknowledged

by the receiver. Bytes 3, 4, and 5 have already been sent, and the sender is
waiting for ACK. Since the rvindow size is 6, bytes 6. 7, and 8 are allowed to
be transmitted. Bytes 9 and above cannot be sent because of the window size
limitation.
o Step 2.TCP has sent bytes 6, 7, and 8, and it is waiting for ACK for all segments
in its current window. A rvindow full of data is in transitl no more data can be
sent at this stage.
o Step 3. ACK for bytes 3 and 4 has been received. At this stage, the sliding
window slides by two to the right, making bytes 9-and 10 eligibie to be sent.
o Step 4. TCP sends bytes 9 and 10 and again starts waiting for ACK.
In summary, the right-hand side of the window slides when a byte is sent,
whereas the left-hand side of the window slides when an ACK is received. The
maximum number of bytes waiting for ACK is determined by the window size.
Window Size Adjustment. In the example of Figure 2.5. we used a fixed

window size of six bytes. In praetice, the window size is adjusted dynamicallv
according to the available buffer space in the receiving TCP. The receiving TCP can
increase or decrease the size of the window in an ACK segment (using the Receiver
Window Size {ield in the TCP segment header). The sending TCP maintains a
variable called AtlvertisedWindow to keep track of the current window size for the
purposes of flow control.
Section 2.1 TCP 43
ffiffiffi*, fET:l-ql-s lldlrrTl?l*$l s,.p r
St"p Z
:l st"P:
S'"P +
Time
E Acknowledgeel Can be sent
m
Waiting for acknowledg-"nt lffi Cannot be sent
re
FIGURE 2.5: TCP's sliding window.
2.1.8 Congestion Control

Flow control effectively prevents buffer overflow at the receiver by dynamically
adjusting lhe AdvertisedWindow according to the available buffer space at the
receiver. The flow control mechanism, however, does not address the buffer over-
flow problem in the intermediate routers during network congestion. To address
network congestion, TCP implements a set of mechanisms collectively called con-
gestion control.
The fundamental principle behind congestion control is to adjust the transmis-
sion window of the sender in such a way that buffer overflow is prevented not only at
the receiver but also at the intermediate routers. To achieve this. TCP uses another
window control variabie called CongestionWindorv. The idea is that if somehow we
could learn the available buffer space in the most congested (bottleneck) router in
the end-to-end path of the TCP connection, we could set the CongestionWindow
accordingly and select the actual transmission window as the minimum of Adver-
tisedWindow and CongesrionWindow. This would prevent buffer overflow both at
the receiver and in the network
The challenge is how to learn the available buffer space in the network
routers. Routers do not participate at the TCP layer and, hence, cannot use
the TCP ACK segments to adjust the window. To overcome this problem, TCP
assumes network congestion whenever a retransmission timer expires and reacts to
network congestion by adjusting CongestionWindow using three algorithms, slow-
start, congestion avoidance, and multiplicative decrease. Several modiflcations of
- these algorithms are currently available. In this section. we descritre the original
versions; modifications are discussed in Chapter 11.
Slow Start. The principle behind the slow-start mechanism is to start with a
small window size and increase it "slowly" (we rvili later see that it is not so slow)
when ACKs arrive. This has the effect of probing the available buffer space in the
network. The actual window increase mechanism is as follows.
Sender Receiver
A
Fi
F
x
i
FIGURE 2.6: TCP slow-start.
Initially, the CongestionWindow is set to one segment. The window size is

increased by one each iime a segment is acknowledged. Assuming a large Adver-
tisedWindow (i.e.,AdvertisedWindow remains greater than the CongestionWindow),
the transmission of segments during a slow-start is illustrated in Figure 2'6' At first
TCP sends one segment. After receiving the ACK for this segment, it sends two
more segmefits (CongestionWindow is incremented to two). When these two new
segmenti are acknowledged in the following RTT (CongestionWindorv is now
incremented to four), it sends four new segments and so on'
Congestion Avoidance. We have seen in the example of Figure 2.6 that after 22 UDP
each RTT. the window size practically gets doubled, allowing twice as many segments --a
to be transmitted. The exponential growth of the CongestictnWindow (against RTT) .:

is illustrated in Figure 2.7. Unless the exponential growth is checked at some point,
it can quickly leid to congestion. To avoid congestion before it happens, TCP
implements ihe congestion avoidance aigorithm, which forces a linear increase of L /.
LLi
Uil
rhi CongestionWiniow after rt reaches a threshold. This threshold is dynamically
adjusted through a variable called ssthresh.
The linear increase during congestion avoidance is achieved by incrementing
the CongestionWindow by 1/CongestionWindow each time an ACK is received' This
way the" Congestionwindow is effectively increased by one every RTT' Figure 2'7
shows an exaftple of how the CongestionWindow is controlled by the slow-start and
the congestion avoidance algorithms for art ssthreslz of 8. It shows the increase of the
CongestionWintlow as a function of RTT. The CongestionWindow increases expo-
nentla[i, during slow-start phase until it reaches the value of. ssthreslr (8 in this case).
After this periorl. it enters the congestion avoidance phase and starts to grow linearlV'
Multiplicative Decrease. Transition from slow-start phase to congestion

avoiclance phase is controlled b.v the variabie ssthresh. Multiplicative decrease is
Section 2.2 UDP 45
>- l)
Congestion avoidance
'o
>10
Slow start threshold
;8
o0
34
Round-trip time (RTT)
FIGURE 2.7: Congestion avoidance.
; lS
the algorithm that controls this variable. With multiplicative decrease, TCP sets
ssthresh to half of the current CongestionWindow each time a timeout occurs (at
:_ .
timeout CongestionWindow itselfis set to one segment to force a slow-start) down
to a minimum of two segments. Therefore, if there are consecutive timeouts (severe
network congestion), multiplicative decrease reduces the sending rate exponentially.
The additive increase of the CongestionWindow during the congestion avoidance
phase and the multiplicative decrease of ssthresh is often referred to as the additive
increase, multiplicative decrease (AIMD) algorithm.
2,2 UDP
- -)
In addition to TCP, the TCP/IP protocol stack provides another transport protocol
called User Datagram Protocol (UDP). In this section, we present an overview
) of IIDP.
^
2.2.1 UDP Services
Unlike TCP, UDP provides a much simpler, bare minimum service to the applica-
tions. All UDP provides is a mechanism for the application to send a short message
to a given destination. UDP is connectionless, unreliable, and not stream-oriented
(it is datagram-oriented). With the datagram-oriented service, UDP cannot accept
a stream of data from the application and segment them for transmission. The
application is supposed to supply segmented data to UDP for transportation as an
independent datagram.
Because UDP is connectionless, it does not implement connection establish-
ment and connection termination. Lack of reliability means that there is no ACK
and retransmission mechanisms and no sequence nurnbers to identify each data-
gram; therefore, a UDP sender will not know if a datagram was lost on the way.
TABLE 2.2: Key differences betrveen TCp and UDp

TCP UDP
Connection-oriented Connectionless
Stream-oriented Datagram-oriented
Reliabie Unreliable
Implements flow control No flow control
Implements congestion control No congestion control
There is no flow control either, meaning that a UDP receiver may experience buffer
overflow. Table2.2 summarizes the key differences between TCp and uDp.
One might be wondering about the practical uses of UDP given its simplicity.
The simplicity of UDP actually turns out to be its strength for many applications
that do not require the heavyweight services of TCP. Some of the traditional and
emerging uses of UDp are:
e Multicasting. Multicasting is an apptication that sends the same piece of data

to many receivers (e.g., video conferencing, web casting, etc.). Because TCp
is a point-to-point connection-oriented protocoi, it is not practical to use TCp
for multicasting with a large number of receivers. UDP is connectionless, so
UDP does not have this scalability problem with multicasting.
o Network management. Network management protocols. such as Simple Net-

work Management Protocol (SNMP), use short request-response messages
suitable for UDP. The overhead of connection establishment and connection
termination for each of these short messages would be overkill.
o Routing table update. Like network management. the routing applications,

such as Routing Information protocol (RIp), rely on query-response type of
communications. These applications sometimes use UDP tor itsiimpticity.
o Real'time multimedia. The emerging audio and video applications. such

as Netmeeting and RealAudio, use UDP. These real-time applications can
tolerate occasional packet losses but cannot tolerate long deiays caused by
retransmissions of lost packets. By the time the retransmitted packet would
arrive at the destination, it would become useless and be discaided anyway.
TCP, therefore, is rarely used by real-time multimedia applications.
2.2.2 Header Format

Like TCP, UDP datagram has a header and a payload. The payload carries the
application message and the header carries the inftrmation necessary for the correct
operation of the UDP protocol. However, unlike TCP header, which has a standard
size of 20 bytes and contains a large number of fields, the UDp header is
very
simple, only eight bytes long. Figure 2.8 shows the UDP header. which consists
of
four fields:
Section 2.3 rP 47
-11
Source port Destination port
Length Checksum
FIGURE 2.8: UDP header format.
Source and destination port numbers (16 bits each). UDP provides port
numbers to let multiple application processes share the same UDP services on
the same host. With 16 bits, there are a total of 65,535 possible ports.
Length (16 bits). The length field represents the total length of the UDP
datagram (including header) in bytes.
o Checksum (16 bits). UDP provides a checksum field to check the integrity of
its data. A packet with incorrect checksum is simply discarded at the receiver,
with no further actions taken.
2.2.3 Encapsulation in lP
Like TCP, UDP datagrams also travel in the payload of IP datagrams. The entire
UDP datagram, the header and the payload, is inserted in the IP payload; thus, the
first eight bytes of the IP payload contain the UDP header.
2,3 tP
IP is the network layer protocol used by both TCP and UDP. In this section, we
provide an overview of the IP protocol.
2.3.1 lP Services
The IP protocol provides a connectionless unreliable datagram model of commu-
nication. IP encapsulates the higher layer protocol units, such as TCP segments
and UDP datagrams, within the IP datagram payload, creates the IP header, and
fbrwards the complete IP datagram to the next hop router toward the destination.
Each intermediate router processes the IP datagram header and forwards it to the
next router along the path until it reaches the destination.
The connectionless model used by IP in the Internet has several advantages.
First of all, there is no need for explicit connection establishment and termination.
This simplifies the router design as the routers do not need to maintbin any
connection-related information: therefore, the connectionless model scales rvell for
a large number of hosts in the Internet. Routers also have flexibility of choosing
48 Chapter 2 Tcpltp Fundamentals
an appropriate path for each IP datagram based
on the congestion level or link
avaiiability in the Internet.
The connection-less model of.Ip, h_owever, has
its price. Ip cannot guarantee
the delivery of data to the destination. The
r"."ir" 1i f[vides is ofren referred to
as the "best-effort" service. This means
that routers will try their best to deliver
a datagram, but if there is congestion and route.,
.unrro, process datagrams fast
enough' they may drop them. IP does not implem"nl
u"y retransmission of lost
datagrams' There is also the possibility of
datagramr-u.irrlng out of order at the
destination. as routers may send diffeient datalrams
uiu arr"r"r, roures. Higher
layer protocols, such as TCP, are used to buill
a reliable service on top of the
unreliable Ip.
2.3.2 Fragmentation and Reassernbly

For ultimate rransmission of Ip datagrlrr
a given physical link, the datagrams
must be encapsulated in the payload of the 9y:rlink laler rru-"u as shown in Figure 2.9.
The frame payload has to ui lirge enough to
hold a given Ip datagram. For most
physical network technorogies, the payroJd
size has unirpp", bound, which imposes
a limit on the datagram size the fiame can
carry. The upper bound is called the
Maximum Transfer Unit (MTU) of the network.
Because routers sometimes connect clissimilar
networks, the MTU of one
port may be rarger than the MTU of the
other port. Foi exampre, the Ethernet
MTU is 1500 bytes, but a seriar point-to-point protocor
of 296' In that case, if a router receives a lrnr; nr* has an MTU
1500-byte Ip datagram from its Ethernet
port and tries to send it over the ppp port,
datagram into several pieces. The process
it must segment the original Ip
oi segmentation of Ip datagrams is called
fragmentation.
IP datagrams are fragmented in such a way that
each fragment becomes
a complete IP datagram that can travel to
its destination independently of the
other fragments' once all fragments are received
at the destination, the process of
putting them all together to riconstruci
the origin;i;;i;;- is called reassembly.
Fragmentation can be done at rhe source
or at any tr;;;J;;'ru.,ri"i,'k
reassembly is done only at the clestination.
TCP segment or UDp datagram

,Carries
IP pavload
lr
*+
,rfamer
i herrder J l*l
Franre par lord I hru.T"
rr trailer
I
___ I
FIcURE 2.9: Encapsulation of Ip datagram into link layer frame.

, Section 2.3 lp 49
Fragmentation could be avoided if the minimum
MTU in the end-to-end path
could be discovered before the start of a communication.
Maximum Segment size (MSS) that determin.r ,rr"
rcp has a variable called
nu-ier of data bytes to be sent
in a segment' TCP can select a MSS small enough
to avoid fragmentation anywhere
in the path' For example, if rcP can learn that"it
ir going to communicate over an
Ethernet network, it wourd serect a MSS of 1460
byt;s iiioo -inm the 40 bytes for
TCP and Ip headers) to avoid fragmentation.
2.3.3 Header Format

Figure 2'10 shows the header format of the current
Ip version (version 4). Some of
the key fields of the Ip header are described below.
version' This feld indicates the version number of the

Ip datagram. currently only
version 4 is used, but version 6 will be available
in the future. If a router or
a host supports both versions 4 and 6, the datagram
can be dl,rected to the
correct process using this field.
Header length. This field is used to indicate the

length of header in multiples of
32-bit words. In most cases the header length o?
options n-e]! is not used). Because the header can
lr aurugrams is zo uvt", rir
be of variable t"rgit , trr"
length field helps to identify the start of the payload.
Type of service (ToS). This field was included in

) IP version 4 so that a source could
request some form-of privileged treatment from
the routers. For example,
control nac.kejs could get preferential treatment in
the wake of congestion in
the network' It could also be used to specify quatity
such as delay and throughput. Howevei, it is nft
tt service requirements,
mandatory for the ,ort"r, io
support this feature. The Internet Engineering Task
Force (IETF) is currently
31
.
tersloni He:ider
l..nqth TOS Length
" lI
16-bit idenrilier Flass Fragment
otfset
lTL Protocol Header
checksum
32-brl source Ip address
-12-bit desttnation Ip address
Oprions (if anv)
f 3\'ir l.l.l
FIGURE 2.i0: lP version 4 header format

50 Chapter 2 Tcp/lp Fundamentals
working on standardizing these bits to support multiple services (as opposed

to only best-effort service) in TCp/Ip networks
[254].
Total length. As the name suggests, tctal length field indicates the total
length of a
datagram in bytes including the header.
Identifier. This field is used to uniquely identify a datagram.
Flags and fragment offset. These two fields are used for fragmentation
and reassem-
bly. The flags field consists of three bits. The Don,t Fr-agment (DF) bit is
set by
a source to indicate that this datagram should not be lragmented.
The More
Fragments (MF) bit indicates the last fragment of the ditagram
to facilitate
reassembly at the destination. The third bit is currently unused.
The Frag-
ment Offset field indicates the exact position of the fragment in the
originll
datagram.
Time to live. Because of the possible routing loops in the Ip

networks, datagrams
can keep circulating in the network. This may result in a
waste of resources.
The time-to-iive (TTL) field restricts the lifo of a datagram in the
network.
The TTL fleld indicates the maximum number of hops a iatagram
can traverse
in the network. Each router decrements this counter by one.
once the value
of this field reaches zero,the datagram is discarded oy tihe router.
Protocol' The protocol field identifies the transport layer protocol

at the receiver
that should receive the data portion of the tr aatagram. As an exampre,
a
value of six indicates that the Ip datagram is destined for TCp,
whereas a
value of 17 indicates that it shourd be passed to a UDp. In
essence, this fletd
helps multiplexing and demultiplexing multiple higher layer protocors
over
the same IP layer.
Header checksum. The header checksum is used by routers

to detect bit errors in
a received IP datagram header. An error in the header
may potentially result
in delivery of the datagram to a wrong destination. Routers simply
discard a
datagram for which.the checksum gives error. The data part
of Ip protocol is
not protected by this checksum. It is up to the higher rayers to
recover from
errors in the data field.
Source and destination addresses. These fields are used to

identify the source and
destination of the Ip datagram. They contai n 3z-bit Ip addresses.
options: The option field can extend the Ip header. As the name suggests, this fleld
is not compulsory. This field can be used to support options
,r.t ,, security,
source routing, route reordering, and timestamping. irri, field
is of variabie
length as the number of options used in a datagram-is not fixed.
Payload. This field eneapsulates higher layer protocol units, such

as TCp segments
or UDP datagrams.
Padding. Padding field can be used to arign the datagram to 32-bitwords.

Sectioru2.4 Further Reading 51
: I tsed 2.3.4 lP Version G
To combat the IP address depletion problem, the IETF has

recently defined a new
:.. UI A version' version 6, for IP. The new version, referred to as
Ipv6, has a much larger
address space; each address is 128 bits long. As a result,
Ipv6 has a much larger
header size. The standard or base header siie is +o bytes,
which i, ,ri." ur-,nu.r, u,
the current 20 bytes in version 4.
Although_ Ipv6 was originaly_ designed to support large
:::efi]- address space, it
includes a number of new features. iru6..rppo.ts authenticatio"n,
: :;t bv data
confldentiality at the network layer. It hasmechanisms to facilitate
inteffi and
; \1Ofe real-time audio
and video transmission. Ilowever, the performance issues
and concepts discussed
in the later chapters are mainly concerned with the TCp layer.
, F:a-QI- Whether Ip version
4 or version 6 is used below T-CP makes little difference
,: :rnai to the material presented
-
in these chapters. Detaited discussion of IPv6, therefore, is outside
the scope of
this book.
i i:.lms 2.4 FURTHER READING
. _:_as,
Comer's Internetworking with TCPfiP
[113] is a classic book on TCp/Ip. It covers
: :15C the entire TCP/IP protocol stack with good details.
'. Forouzan's TCP/IP Protocol Suite
riue U49l explains the entire TCP/IP suite with
many easy-to-understand illustrations and examples.
.vorume I [302], explains maRy concepts and

Stevens's TCpfip lilustrated,
.-.\!l operations of TCp by analyzingpacket traces on tive networks.
l::i d 2.5 SUMMARY

, --:1d
TCP and UDP are two transport layer protocols used in
: -\if the TCp/p networks. TCp
provides a connection-oriented reliable service to its
applications. The reliability is
achieved through acknowledgments of correctly ."""lueo
o"t, ,rJ r"tr"r.*irri",
-:) ln of lost data. TCP is used by most applications on the Internet,
such as e-mail,
:=sult flle transfer, and the World Wide Web,- TCP uses sophisticated
congestion *ntrot
-l algorithms to adjust its sending rate according to the observed
^ network state. It
,:-l is yt:t ' sliding window mechanism to prevent buffer overflow at a slow receiver.
-:,tm UDP is a lightweight protocol that doei not guarantee delivery
of data. Multimedia
applications usually use UDP, as they can tolerate occasional
packet losses. Both
uDp run over the Ip rayer, which provides an unreliabre datagram
: i .JltLt l9,I^i'o
servlce.
2.5 REVTEW QUESTTONS

' --1.1
! 1U
TCPiIP has rrvo rransport protocols. TCp ancl tjDp. What
-:: t\'. are the key differ-
ences hel'ulecrt thunt.)
-.:le
t]DP r-ioes not have a,rv built-in rehalriliti,.. whv would one use UDp instead
ol T('P.'
-:-tl\ 3. why does TCP co,nection termination need four-way handshaking,
whereas
TCP connection establishment needs only three-way handshaking
? ,
4' Using an example, explain how cumulative acknowledgment

can compensate
for a lost ACK.
7
52 Chapter 2 TCPIIP Fundamentals
5. What is the purpose of TCP timeouts, and why is the timeout duration
important?
6. When a timeout occurs, TCP sets its slow-start threshold to half its current
congestion window size (multiplicative decrease). Can you think of the conse-
quences if multiplicative decrease were replaced by additive decrease (say, for
example, that current congestion window is decreased by one)?
7. In most cases, TCP retransmission timer expires whenever a router drops a
packet due to buffer overflow (the packet never reaches receiver). Can you
think of situations when RTO occurs even though packets reach receiver?
8. Fragmentation can be done at the source or any intermediate routers, but
reassembly is done only at the destination. Why do intermediate routers not
reassemble IP fragments?
9. Can you think of any disadvantages of IP fragmentation?
10. What role can TCP play to avoid IP fragmentation?
2.7 CASE STUDY: WCORP ADOPTS TCP/IP

Although multiprotocol stacks had been meeting the interconnecting needs of
WCORP in the past, the following costs were identifled for maintaining multiple
stacks:
Increased load on memory. As each PC loads four protocol stacks, very little mem-
ory is left for running applications.
Reduced performance. Multiple protocols draw more CPU cycles, causing adverse
effect on performance.
Multiple address managernent. Different protocols use different addresses for iden-
tifying and communicating between computers. When multiple protocol stacks
are loaded on a PC. multiple addresses have to be assigned and managed for
each PC, making the address management much harder. Communication
errors caused by incorrect address assignment become difficult to isolate and
correct.
Multiple routing systems. Different stacks use different routing protocols and sys-
tems. With multiple stacks, routers must maintain multiple routing systems.
These multiprotocol routers are very costly to purchase and maintain'
Because of the above costs associated with multiple stacks, WCORP has
decided to adopt a single protocol strategy to meet its interconnecting require-
ments. As cliscussed in the previous case study (see Chapter 1), the four major
stacks currently in use are SNA, IPX/SPX, DECnet, and TCP/IP. To adopt a single
protocol strategy, WCORP must select one of these four stacks. As a first step
towarcl making this selection, the network administrator identifies six important
communication requirements to be fulfilled by the single protocol stack: native con-
nectivity to the public Internet. nonproprietary ownership, reliable communication,
connectionless communication, client-server communication, and routing between
different subnets. Table 2.3 shows the comparison of different stacks against these
.::,ltln Section 2.7 Case Study: WCORP Adopts Tcp/tp 53
TABLE 2.3: Comparison of different protocol stacks.
_:::nf
Protocol Internet Client-
: i,--rf
Stack Connectivity Ownership Reliable Connectionless Server Routing
SNA Difflcult IBM Yes Yes Yes Yes
: :: il IPX/SPX Difficult Novell Yes Yes Yes Yes
:i .,-tg DECnet Difficult Digital Yes Yes Yes Yes
:: TCP/IP Easy Open Yes Yes Yes Yes
rut
not six requirements. After careful consideration, WCORP has finall,v decided to adopt
TCPAP as the single stack to support interconnectivity. The driving factors for this
selection were the open standard of TCp/Ip (not owned by any rp".ifi. vendor) and
seamless connectivitv to the public Internet.
: :: ,..-i
- -i -
- ---'1C
: :,-\:
.:U -I
- :.,:-
^ l\
- : -lf
, r-:i
':
-.
-:il
;--l-
I -:ll
i-::i

Networks Chapter 2

Uploaded by

Copyright:

Available Formats

Networks Chapter 2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Networks Chapter 2

Uploaded by

Copyright:

Available Formats

CHAPTER

o Gain an understanding of the basic services provided by TCP, UDP, and Ip

o Describe protocol details of TCP necessary to ensure reliable data transfer

TCP is a very complex protocol. To understand the performance dynamics of TCp,

2.1.1 TCP Services

Connection'Oriented Service. TCP is a connection-oriented protocol. Before

Futl-Duplex Service. TCP is a full-duplex prolocol supporting data flow in

Reliable Service. TCP guarantees delivery of every single byte, in order,

End-to-End Semantic. TCP's reliability is based on an end-to-end seman-

2.1.2 Header Format

Source port # Destination port #

Checksum tlrgent pointer data

Application data (variable length)

FIGURE 2.1 : TCP segment forrnat.

TABLE 2.1: TCP flags.

FTGURE 2.2: Encapsuration of rcp segments into Ip datagrams.

2.1.4 Acknowledgrnent Mechanism

Cumuiative Acknowledgment. Each ACK is a confirmation that all bytes up

ACK-Only Se^gment and Piggybacking. The ACK is indicated through an

2.1.5 Retransmission Mechanism

EstimatedRtT : (i -a) xEstimatedRTT+fl * SampleRTT (2 l)

The deviation factor in Equation (2.2) accommodates any fluctuations in SampleRTT

Most TCP implementations represent the RTO as a multiple of clock "ticks."

2.'1.6 Connection Establishment and Termination

Connection Bstablishment. The process of establishing a TCP connection is

SYN, ACK, SeqNo : 155

FIGURE 2.3: TCP connection establishment using three-way handshaking.

Connection Termination. The process of terminating a TCP connection

FIG URE 2.4: TCP connection termination using four-wav handshaking.

2.1.7 Flow Control and Sliding Window

Sliding Window. TCP implements a sliding window scheme to .rccomplish

r Step 1. Bytes 0. 1, and 2 have already been transmitted and acknowledged

Window Size Adjustment. In the example of Figure 2.5. we used a fixed

ffiffiffi*, fET:l-ql-s lldlrrTl?l*$l s,.p r

E Acknowledgeel Can be sent

FIGURE 2.5: TCP's sliding window.

2.1.8 Congestion Control

FIGURE 2.6: TCP slow-start.

Initially, the CongestionWindow is set to one segment. The window size is

to be transmitted. The exponential growth of the CongestictnWindow (against RTT) .:

Multiplicative Decrease. Transition from slow-start phase to congestion

FIGURE 2.7: Congestion avoidance.

TABLE 2.2: Key differences betrveen TCp and UDp

e Multicasting. Multicasting is an apptication that sends the same piece of data

o Network management. Network management protocols. such as Simple Net-

o Routing table update. Like network management. the routing applications,

o Real'time multimedia. The emerging audio and video applications. such

2.2.2 Header Format

Source port Destination port

FIGURE 2.8: UDP header format.

2.3.2 Fragmentation and Reassernbly

TCP segment or UDp datagram

FIcURE 2.9: Encapsulation of Ip datagram into link layer frame.

2.3.3 Header Format

version' This feld indicates the version number of the

Header length. This field is used to indicate the

Type of service (ToS). This field was included in

-12-bit desttnation Ip address

Oprions (if anv)

ffiffiffi, fET:l-ql-s lldlrrTl?l$l s,.p r