Troubleshooting Internetwork Routing

When setting up internetwork routing or testing an existing route for problems, there are two tools that are commonly available on most TCP/IP systems: Ping and Trace Route. Using these tools can help identify if a routing problem exists and can help to narrow down areas to look for the problem. Most operating systems and programmable routers that support TCP/IP have some version of the ping program. Trace route is not as common as ping, but can be found on most operating systems and some programmable routers that support TCP/IP.

Ping

The ping program sends out an echo message to a specific address. If the system at that address can be reached and is functioning, it should echo the message back to the sending system. Most ping programs will display information showing how long it took for the message to make the round trip to the destination and back. Ping programs usually allow parameters to be set to vary the program operation. These can include the number of messages to be sent, the number of seconds between packets, the size of the packets, and the pattern of the data transmitted.
Under Microsoft Windows TCP/IP, the ping program normally defaults to sending out 4 messages. The following is an example of the output from a ping executed on a Microsoft Windows NT 4.0 system:

Pinging ns1.wlw.com [205.217.146.198] with 32 bytes of data: Reply from 205.217.146.198: bytes=32 time<10ms TTL=255 Reply from 205.217.146.198: bytes=32 time<10ms TTL=255 Reply from 205.217.146.198: bytes=32 time<10ms TTL=255 Reply from 205.217.146.198: bytes=32 time<10ms TTL=255

On a Linux UNIX system the ping program will continue to send packets until it is stopped with a Control-C; unless a packet count is specified with the -c option. The following is an example of the output from a Linux system:

PING zeus.wlw.com (205.217.146.200): 56 data bytes 64 bytes from 205.217.146.200: icmp_seq=0 ttl=128 time=2.8 ms 64 bytes from 205.217.146.200: icmp_seq=1 ttl=128 time=2.0 ms 64 bytes from 205.217.146.200: icmp_seq=2 ttl=128 time=2.0 ms 64 bytes from 205.217.146.200: icmp_seq=3 ttl=128 time=2.0 ms 64 bytes from 205.217.146.200: icmp_seq=4 ttl=128 time=2.0 ms 64 bytes from 205.217.146.200: icmp_seq=5 ttl=128 time=2.0 ms ^c --- zeus.wlw.com ping statistics --- 6 packets transmitted, 6 packets received, 0% packet loss round-trip min/avg/max = 2.0/2.1/2.8 ms

Note that the UNIX version of ping gives summary information when terminated.

Trace Route

The trace route program is not as widely available as ping, but is found on most TCP/IP systems. Trace route uses the same type of echo message as ping, but it uses the message in a different way. One of the parameters in a TCP/IP packet is called Time-To-Live (TTL). This parameter tells how many gateways the message can pass through before it is discarded. Each gateway that forwards the message decrements the TTL counter. When the counter reaches Zero the message is discarded. Then a message is sent back to the notify the sender that the TTL on that message was exceeded. The main purpose of TTL is to force a packet to die if it becomes stuck in some sort of circular path, rather than allowing it to cycle endlessly around the Internet.
Trace route uses the TTL parameter in the echo message to map the route taken by a message as it moves through the network. The message is sent out first with a TTL of one. This will expire at the first gateway. Then the TTL is increased by one each time the packet is sent out, until it reaches its destination.
Each time a TTL expiration message is returned by a gateway, two important pieces of information are gathered: the IP address of the gateway that expired the message, and the time it took for the message to make the round trip to that gateway and back to the sender. This information is usually displayed in the trace route output. This allows trace route to map the route taken by the packet, step by step, from source to destination. If a remote site cannot be reached, trace route will usually show each step of the route until the message reaches a point where it is not returned, or where it can be seen to be looping. Most trace route programs default to a maximum of 30 hops before they terminate. This maximum can usually be overridden by using option switches on the command line.
Most trace route programs will send out at least three messages to each step of the route, returning information on each of these attempts. When a message is not returned, the missing message is usually identified by an asterisk (*) instead of the time it took for the round trip.
Trace route programs will usually accept a destination that is either a numeric IP address or a DNS Domain Name. If an IP address is given, the program will usually look up and display the name that is associated with that address. If a domain name is given, the system looks up and displays the IP address. As the program displays each hop of the trace, it will usually look up the IP address and display any domain name that is associated with that address.
On a system using Microsoft Windows TCP/IP the trace route program is called tracert. The following is an example of output from a Windows NT 4.0 system that shows what the output looks like when a route is down at a router on the fourth hop:

D:\>tracert ns1.berkeley.edu Tracing route to ns1.berkeley.edu [128.32.136.9] over a maximum of 30 hops: 1 20 ms <10 ms 10 ms router.wlw.com [205.217.146.1] 2 140 ms 40 ms 40 ms 204.153.64.50 3 40 ms 40 ms 40 ms 204.153.64.1 4 * * * Request timed out. 5 * * * Request timed out. 6 * * ^C

Here is an example output of the same trace route that gets all the way to the destination:

D:\>tracert ns1.berkeley.edu Tracing route to ns1.berkeley.edu [128.32.206.9] over a maximum of 30 hops: 1 20 ms <10 ms 10 ms router.wlw.com [205.217.146.1] 2 40 ms 40 ms 40 ms wwi_isdn_a01.wwi.net [204.153.64.50] 3 41 ms 50 ms 210 ms wwi_7000.wwi.net [204.153.64.1] 4 50 ms 50 ms 40 ms 904.Hssi4-0.GW1.KCY1.ALTER.NET [137.39.151.17] 5 180 ms 380 ms 291 ms Fddi0-0.CR1.KCY1.Alter.Net [137.39.37.225] 6 150 ms 70 ms 80 ms 126.Hssi6-0.CR1.DCA1.Alter.Net [137.39.59.29] 7 90 ms 80 ms 90 ms 101.Hssi4-0.CR1.TCO1.Alter.Net [137.39.69.85] 8 140 ms 451 ms 110 ms 411.atm10-0.br1.tco1.alter.net [137.39.13.13] 9 751 ms 141 ms 80 ms Sprint-TCO1-gw.ALTER.NET [137.39.103.18] 10 80 ms 70 ms 81 ms sl-dc-1-F/T.sprintlink.net [198.67.0.7] 11 370 ms 401 ms 320 ms sl-stk-2-H2/0-T3.sprintlink.net [144.228.10.105] 12 260 ms 111 ms 100 ms sl-stk-16-F0/0.sprintlink.net [144.228.40.16] 13 260 ms 130 ms 110 ms sl-csuberk-1-H1/0-T3.sprintlink.net [144.228.146.50] 14 220 ms 110 ms 110 ms inr-666-dmz.berkeley.edu [198.128.16.21] 15 110 ms 110 ms 120 ms inr-107-styx.Berkeley.EDU [128.32.2.1] 16 110 ms 120 ms 121 ms inr-100.Berkeley.EDU [128.32.235.100] 17 191 ms 140 ms 150 ms ns1.Berkeley.EDU [128.32.206.9] Trace complete.

On a system using Linux UNIX the trace route program is called traceroute. The following is an example of the output from a Linux system that shows what the output looks like when a route is down at a router on the fourth hop:

ns1:~# traceroute ns1.berkeley.edu traceroute to ns1.berkeley.edu (128.32.206.9), 30 hops max, 40 byte packets 1 router.wlw.com (205.217.146.1) 3.128 ms 2.94 ms 2.699 ms 2 204.153.64.50 (204.153.64.50) 124.093 ms 33.994 ms 33.285 ms 3 204.153.64.1 (204.153.64.1) 49.623 ms 104.921 ms 35.334 ms 4 * * * 5 * * * ^c

Here is an example output of the same trace route that gets all the way to the destination:

ns1:~# traceroute ns1.berkeley.edu traceroute to ns1.berkeley.edu (128.32.136.9), 30 hops max, 40 byte packets 1 router.wlw.com (205.217.146.1) 3.192 ms 2.853 ms 2.83 ms 2 wwi_isdn_a01.wwi.net (204.153.64.50) 37.2 ms 33.519 ms 33.809 ms 3 wwi_7000.wwi.net (204.153.64.1) 35.617 ms 35.857 ms 35.658 ms 4 904.Hssi4-0.GW1.KCY1.ALTER.NET (137.39.151.17) 41.283 ms 173.836 ms 48.061 ms 5 Fddi0-0.CR1.KCY1.Alter.Net (137.39.37.225) 37.197 ms 60.388 ms 37.575 ms 6 126.Hssi6-0.CR1.DCA1.Alter.Net (137.39.59.29) 79.932 ms 67.333 ms 66.861 ms 7 101.Hssi4-0.CR1.TCO1.Alter.Net (137.39.69.85) 242.576 ms 73.019 ms 74.759 ms 8 411.atm10-0.br1.tco1.alter.net (137.39.13.13) 448.886 ms 384.883 ms 418.54 ms 9 Sprint-TCO1-gw.ALTER.NET (137.39.103.18) 152.252 ms 201.521 ms 248.778 ms 10 sl-dc-1-F/T.sprintlink.net (198.67.0.7) 417.669 ms 72.166 ms 71.769 ms 11 sl-stk-2-H2/0-T3.sprintlink.net (144.228.10.105) 120.468 ms 448.901 ms 488.277 ms 12 sl-stk-16-F0/0.sprintlink.net (144.228.40.16) 108.583 ms 118.499 ms 104.628 ms 13 sl-csuberk-1-H1/0-T3.sprintlink.net (144.228.146.50) 233.624 ms 138.528 ms 109.856 ms 14 inr-666-dmz.berkeley.edu (198.128.16.21) 110.952 ms 339.673 ms 122.125 ms 15 inr-107-styx.Berkeley.EDU (128.32.2.1) 112.056 ms 199.265 ms 172.471 ms 16 inr-101.Berkeley.EDU (128.32.235.101) 126.278 ms 169.483 ms 115.056 ms 17 ns1.Berkeley.EDU (128.32.136.9) 107.024 ms 250.979 ms 127.702 ms ns1:~#

Ping and Trace Route are very useful for troubleshooting problems in TCP/IP communications. They allow the network administrator to test if a remote system can be reached, and if not, where the broken link is in the chain. The information gathered by these two programs can be misleading, however. If a remote system is protected by a security firewall, it is often possible to reach it via e-mail, HTTP, and other standard protocols, while the ping and trace route messages are stopped at the firewall. Also, if there is more than one route to a remote host, it can sometimes happen that a message can reach the remote host while the return message follows a different route back and encounters a problem on the return trip. To effectively troubleshoot a TCP/IP problem on a route that crosses more than one organization's networks it is often necessary to have experienced network technicians from all of the networks involved in the troubleshooting procedure.