Network Layer.

The main goal of the Network Layer is to move packets from a sending host to a receiving host. To do so, three important network-layer functions can be identified:

Datagram vs. Virtual Circuit

We can divide all network layer services into two groups:
  • connection-oriented services
  • connectionless services.
    Virtual Circuit is a synonym for connection-oriented service. There are three identifiable phases in a VC: With a datagram network layer, each time an end system wants to send a packet, it stamps the packet with the address of the destination end system and then pops the packet into the network (no setup state). Routers forward the packet toward its destination by examining the packet's destination address, indexing a forwarding table with the destination address, and forwarding the packet in the direction of the destination. This type of service (currently provided by the Internet) is also known as best-effort service.

    Routing Principles

    In order to transfer packets from a source to a destination, the network layer must determine the a path, or route that the packets are to flow. This is the job of the network layer routing protocol.

    Typically there is only one router a host is attached to (default or first-hop router). We will refer the default router of the destination as destination router and default router of the source as source router. The problem of routing packets from source host to the destination host is now the problem of routing the packets from source router to the destination router.

    We can classify routing algorithms several ways:

    1. whether they are global or decentralized
      • A global routing algorithm computes the least-cost path between a source and a destination using complete global knowledge of the network. In practice, algorithms of this type are often referred to as link state algorithms.
      • In a decentralized routing algorithm, the calculation is carried out in an iterative, distributed manner. No node has complete information, instead each node begins with only the knowledge about its directly attached links. (Example of such algorithms is distance vector algorithm.)
    2. whether they are static or dynamic
      • In static routing algorithm routes change very slowly over time.
      • Dynamic routing algorithms change the routing paths as the network traffic loads or topology change.
    3. whether they are load-sensitive or load-insensitive
      • In load-sensitive algorithms, link cost vary dynamically to reflect the current level of congestion in the underlying link.
      • In load-insensitive algorithms, a link's cost does not typically reflect its current level of congestion.

    A Link State Routing Algorithm

    The link state algorithm presented below is known as Dijkstra's algorithm. Let's define the following notations:

    Dijkstra's algorithm

    Initialization:
     N = {A}
     for all nodes v
       if v adjacent to A
         then D(v) = c(A, v)
         else D(v) = ¥
    Loop
      find w not in N such that D(w) is a minimum
      add w to N
      for v adjacent to w and not in N update D(v):
        D(v) = min{ D(v), D(w)+c(w, v) }
        /* new cost to v is either the old cost or known shortest path cost to w plus cost from w to v */
    until all nodes in N
    
    The following page illustrates how this algorithm works. On this page you can find graphical illustration to the algorithm.

    The Distance Vector Routing Algorithm

    DV algorithm is

    The principal data structure for DV algorithm is the distance table maintained at each node (note: each node has its own distance table). Each node's distance table has a row for each destination in the network and a column for each of its directly connected neighbors. Each element of the table can be denoted as

     DA(Z,R)
    
    This notation means the price for a packet from node A send to the node Z via A's directly attached neighbor R. Thus, this element is an element of the distance table of the node A (since this is the only node interested in such information). Another piece of information each node maintain is the distance to each of its neighbors. We'll use the same notation C(A,R) for the variable storing this data, but notice that each node now has only one row of the matrix. Such, node A knows C(A,B), C(A,C), C(A,D), but doesn't know C(B,R). The goal of the DV algorithm is to compute distance table. To do this the next algorithm is used:
    Initialization:
      for all nodes v:
        if v adjacent to X
          then DX(v, v) = C(X,v)
          else DX(*,v) = ¥
      for all nodes v
        find min DX(v,w) for all w from neighbors
        send this min DX(v,w) to all neighbors
    
    Loop
      wait until either a link cost to neighbor V is changed
              or an update from neighbor V  received
    
      if C(X,V) changes by d /* d can be both positive and negative */
        then for all destinations y : DX(y,V) = DX(y,V) + d
        else if update received from V for destination Y (by d)   /* shortest path from V to some Y has changed by d */
          then DX(Y,V) = C(X,V) + d
    
      if there is a new minwDX(v,w) for any destination v
        then send this new value to all neighbors
    forever
    

    IP addressing

    Class Starts with Bits Host # bits IP Range
    A 0
    0 net ID host ID
    24     1.0.0.0 - 127.255.255.255
    B 10
    1 0 net ID host ID
    16 128.0.0.0 - 191.255.255.255
    C 110
    1 1 0 net ID host ID
    8 192.0.0.0 - 223.255.255.255
    D 1110
    1 1 1 0 Multicast group ID
    Multicast address 224.0.0.0 - 239.255.255.255

    Note: The class A network ID 127 is reserved for the loopback interface. By convention, most system assign the IP address of 127.0.0.1 to this interface and asign it the name localhost.

    CIDR = Classless InterDomain Routing. Together with the IP address provide network mask - information about how many bits network address takes: a.b.c.d/20.

    References:

    Dynamic Host Configuration Protocol (DHCP)

    DHCP is client-server protocol which is used to to obtain an IP information automatically from a server. Besides IP address host also can obtain
  • IP address of the first-hop router (default gateway)
  • IP address of its DNS server
  • subnet mask
    A client is typically newly arriving host wanting to obtain IP address. For such a host DHCP is a four-step process: Note: all DHCP messages have a transaction ID. DHCP server offer has the same transaction ID as the discovery message the server responds to. Client request message has transaction ID incremented by 1 from the discovery transaction ID.

    You can find more details in the official documentation RFC 2131.

    Network Address Translators

    Network Address Translation is a method by which IP addresses are mapped from one address realm to another, providing transparent routing to end hosts. All flavors of NAT devices should share the following characteristics:
    1. Transparent Address assignment.
    2. Transparent routing through address translation. (Routing here refers to forwarding packets, and not exchanging routing information.)
    3. ICMP error packet payload translation.

    NAT binds addresses in private network with addresses in global network and vice versa to provide transparent routing for the datagrams traversing between address realms. The binding in some cases may extend to transport level identifiers (such as TCP/UDP ports). Address binding is done at the start of a session. The following sub-sections describe two types of address assignments.

    Static Address assignment
    In the case of static address assignment, there is one-to-one address mapping for hosts between a private network address and an external network address for the lifetime of NAT operation. Static address assignment ensures that NAT does not have to administer address management with session flows.

    Dynamic Address assignment
    In this case, external addresses are assigned to private network hosts or vice versa, dynamically based on usage requirements and session flow determined heuristically by NAT. When the last session using an address binding is terminated, NAT would free the binding so that the global address could be recycled for later use. The exact nature of address assignment is specific to individual NAT implementations.

    There are three blocks of IP address space recommended for private networks, namely

  • 10/8
  • 172.16/12
  • 192.168/16

    Official documentation:

  • NAT: Terminology and Considerations - RFC2623
  • The IP Network Address Translator - RFC1613

    IP header

     
    IP version
    (current version:4)
    IP Header length
    (in 32-bit words)
    Type of service
    Total packet length
    (in bytes from the beginning of IP header)
    IP packet ID number
    (packet serial number)
    Fragment bit flags IP fragment offset
    (location of this packet fragment in original packet, in 8-byte units)
    Time to live
    (maximum router hops remaining)
    Transport protocol
    (ICMP:1, IGMP:2, TCP:6, UDP:17)
    Header checksum
    (Note: Only the header is checksummed)
    Source IP address
    (4-byte IP address for host from which the packet originated)
    Destination IP address
    10  (4-byte IP address of the host where the packet is going)
    11  IP options
    (Length of this part is usually 0 but may vary)

    IP Datagram Fragmentation

    Not all link-layer protocols can carry packets of the same size. For instance, Ethernet packets can carry no more than 1500 byte, whereas packets for many wide-area links can carry no more than 576 bytes of data. The maximum amount of data that a link-layer packet can carry is called the MTU (maximum transfer unit). Because each IP datagram is encapsulated whithin the link-layer packet, the MTU places a hard limit on the length of an IP datagram.

    The problem is that each link along the route between the sender and the receiver can use different link-layer protocol, and each of these protocols can have a different MTU. How to squeeze an oversized IP datagram to fit teh next MTU?

    The solution to this problem is to fragment the data in the IP datagram among several smaller datagrams. To allow the next router/host to reassemble IP header includes :

  • packet ID
  • fragment flag
  • fragmentation offset
    When a router needs to fragment a datagram, each resulting datagram has source address, destination address, and ID of the original datagram. When the destination receives a series of datagrams from the same sending host it checks their IDs to determine if these datagrams are actually fragments of the same data. Here is an example. Assume a datagram with ID=123 and with 4000 byte of data arrives to a router. The next link allows only 1500 bytes of data in a datagrams, then this router will create 3 datagrams with the following fields:
      • ID = 123
      • offset = 0
      • fragment flag = 1 (there is more data)
      • 1480 bytes of data (20 bytes for IP header)
      • ID = 123
      • offset = 1480 ()
      • fragment flag = 1 (there is more data)
      • 1480 bytes of data (20 bytes for IP header)
      • ID = 123
      • offset = 2960
      • fragment flag = 0 (this is the last one)
      • 1040 bytes of data (20 bytes for IP header)
    Please take a look at IP fragmentation applet.

    Internet Control Message Protocol

    The most typical use of ICMP is for error reporting. If at some point, an IP router is not able to find a path to the host specified in the IP destination field, then this router creates and sent an ICMP message to the source host to indicate the error. Same thing happens if a router has to drop an IP packet because its TTL is zero.

    Another use of ICMP is for service tools such as ping or traceroute.

    ICMP is often considered part of IP but architecturally lies just above IP, as ICMP messages are carried inside IP packets. That is, ICMP messages are carried as IP data, just as TCP or UDP segments. Nevertheless, ICMP is a part of the Network Layer.

    ICMP message has the following format:
     
    Type Code
    Checksum
    Pointer Reserved
    Reserved
    Data: Internet header + 64 bits of original datagram header
    ...  (the IP packet that generated this ICMP message)

    ICMP is specified in RFC792.

    Routing in the Internet

    Internet is orginized into a collection of autonomous systems (ASs), each of which is normally administered by a single entity. A corporation or university campus network often defines an autonomous system. The NSFNET (The National Science Foundation Network) backbone of the Internet forms an AS, because all the routers in the backbone are under a single administrative control. The most important characteristic of Internet routing protocols is whether the protocols are used to route datagrams within a single AS, or among multiple AS.

    An intra-AS routing protocols are used to determine how routing is performed within an autonomous system (AS). Intra-AS routing protocols are also known as interior gateway protocols.

    Intra-Autonomous System Routing: RIP

    The Routing Information Protocol (RIP) was one of the earliest intra-AS Internet routing protocols. RIP is a distance-vector protocol that operates in a manner very close to idealized protocol we have examined.

    The first version of RIP uses hop count as a cost metric; that is, each link has cost of 1. The maximum cost of a path is limited to 15, thus limiting the use of RIP to AS that are fewer than 15 hops in diameter. In RIP, routing updates are exchanged between neighbors approximately every 30 seconds using so called RIP response message. The response message contains a list of up to 25 destination networks within the AS, as well as the sender's distance to each of those networks. Response messages are also known as RIP advertisements. If a router does not hear from its neighbor at least once every 180 seconds, that neighbor is considered to be no longer reachable. When this happens, RIP modifies the local forwarding table and then sends this information to its neighboring routers. A router can also request information about its neighbor's cost to a given destination by using RIP's request message. Routers send RIP messages to each other over UDP using port number 520. The UDP datagrams are carried between routers in a standard IP datagram. The fact that RIP uses a transport layer-protocol on top of a network-layer protocol to implement network-layer functionality may seem a little convoluted. Actually, RIP is an application-layer process working with the network-layer routing tables.

    Intra-Autonomous System Routing: OSPF

    The Open Shortest Path First routing is also used for intra-AS routing. "Open" means that the protocol specification is publicly available.

    OSPF is a link state protocol that uses flooding of link state information and a Dijkstra least cost path algorithm. Individual link cost are configured by the network administrator. The administrator may want to choose to set all links to costs 1 or might choose to set the link weights to be inversely proportional to link capacity in order to discourage traffic from using low-bandwidth links.

    With OSPF, a router broadcast routing information to all other routers in the AS. A router broadcasts link state information whenever there is a change in a link's state. It also broadcasts a link's state periodically (at least once in 30 minutes). OSPF messages are carried directly by IP, with an upper-layer protocol 89 for OSPF.

    Some of the advances embodies of OSPF include the following:

    OSPF hierarchical routing: An OSPF AS can be configured into areas. Each area runs its own OSPF link state algorithm, with each router in an area broadcasting its link state information to all other routers in that area. Within each area, one or more routers are to be configured as area border routers and be responsible for routing packets outside the area. Exactly one OSPF area in the AS is configured to be the backbone area. The primary role of the backbone area is to route traffic between the other areas in the AS. The backbone area always contains all the area border routers and may contain non-border routers as well. We can identify 4 types of OSPF routers :

    Inter-Autonomous System Routing: BGP

    The Border Gateway Protocol v4, specified in RFC 1771, is de facto standard interdomain routing protocol in today's Internet. It provides routing among autonomous systems (AS). While BGP is a distance vector protocol, it is more appropriately characterized as a path vector protocol. That is because neighboring BGP routers exchange detailed path information rather than cost information.

    At the heart of BGP are route advertisements. A route advertisement is sent from one BGP peer to another over a point-to-point connection. An advertisement consists of a CIDRized destination network address (like 128.119.40/24) and a set of attributes associated with the path to that destination network. Two of most important attributes are

  • the path attribute (an explicit list of all ASs on the route)
  • and the identity of the next-hop router along the route to that destination network.

    BGP operation revolves around three activities, all involving route advertisement:

    Note: BGP provides only a mechanism not a routing policy. That is, the decision how to filter and select paths and what to advertise is a network administrator's responsibility. Please also notice that we consider BGP routers inside one AS as neighbors even if they are not directly connected to each other.

    BGP peers communicate using the TCP protocol and port 179.

    IP version 6.

    The specification of IP version 6 are defined in RFC 2460.
     
    Version Traffic Class Flow Label
    Payload length Next Header Hop Limit
     
    Source IP address
    (128-bit IP address of the originator of the packet)
     
     
    Destination IP address
    (128-bit IP address of the intended recipient of the packet)
    10   

    Field descriptions:

    The most important changes:

    Features of IPv4 that are no longer in the IPv6 header:

    Transitioning from IPv4 to IPv6: Dual stack vs. Tunneling.

    References: