Part 1, The basics of network troubleshooting

As an IBM AIX® systems administrator, it’s inevitable that at some point in your role you will encounter a problem that’s linked to or directly caused by an issue within the LAN or WAN. In such instances, its good practice to make an initial diagnosis of the problem before engaging the help of a suitable network administrator to help identify root cause, or at least give the administrator a general direction in which to start his or her investigation.

Once engaged, you may be required to assist in the analysis, so it’s essential that you come armed with a relevant diagnostic toolkit. This article provides you with a set of commands available on AIX, many of which are also available on other flavors of UNIX®, which can help you to troubleshoot TCP/IP network-related issues.

For the purposes of this article, the target host system used in all sample commands and output is called testhost.

Is anybody there?

The first step in diagnosing any network-related issue is to verify whether the target host is running. You can use ping to test whether a host is reachable across a network (see Listing 1). This command sends an Internet Control Message Protocol (ICMP) echo request packet to the host and waits for an echo reply.

A successful ping means that:

  • Your host has an active network adapter that can be used to send out the request.
  • The target host is running and has an active network adapter configured with the IP address that you used.
  • Name resolution is working for that host if a name was used rather than an IP address.
  • There is a route from your host to the target and back.
  • No firewalls on the route between hosts or running on either host are blocking ICMP traffic.

The output from a successful ping can also be useful in helping to determine network latency, as it reports on the time taken to receive the echo reply. Long response times are likely to mean poor performance for any applications that exchange data with the target host.

Listing 1. Pinging a responsive host
#ping testhost
PING testhost: ( 56 data bytes
64 bytes from icmp_seq=0 ttl=253 time=0 ms
64 bytes from icmp_seq=1 ttl=253 time=0 ms
64 bytes from icmp_seq=2 ttl=253 time=0 ms
64 bytes from icmp_seq=3 ttl=253 time=0 ms
‑‑‑‑testhost PING Statistics‑‑‑‑
4 packets transmitted, 4 packets received, 0% packet loss
round‑trip min/avg/max = 0/0/0 ms

If no echo reply is received, then one or more of the conditions described above hasn’t been met and the ping fails (see Listing 2). A ping fails when the number of packets received is less than the number sent and packet loss is greater than 0 percent.

Listing 2. Pinging an unresponsive host
#ping testhost
PING ( 56 data bytes
‑‑‑‑ PING Statistics‑‑‑‑
5 packets transmitted, 0 packets received, 100% packet loss

If the ping was unsuccessful, you can check whether the adapter used to send the request is up by using ifconfig.

You can use the ifconfig command to display the status of an individual adapter (for example, en1 shown in Listing 3) or all adapters using the -a switch, also shown in Listing 3. You should ensure that the adapter used to send packets out to your host is showing as UP and RUNNING. If it’s not, then you need to investigate further.

Listing 3. Displaying network adapter status
#ifconfig en1
        inet netmask 0xffffff00 broadcast
         tcp_sendspace 131072 tcp_recvspace 65536

#ifconfig ‑a
        inet netmask 0xffffff80 broadcast
        inet netmask 0xffffff00 broadcast
         tcp_sendspace 131072 tcp_recvspace 65536
        inet netmask 0xff000000 broadcast
        inet6 ::1/0
         tcp_sendspace 65536 tcp_recvspace 65536

You can display Ethernet statistics on an adapter with entstat (see Listing 4). The example shown uses the -d switch to display all statistics, including device-specific statistics, for the adapter en2. This command can also be useful for telling you the link status (up or down) and media speed (for example, 100Mbps Full Duplex). The media speed is useful if you need to verify the setting based on the link partner and network that the adapter is connected to, as a speed or duplex mismatch can cause problems.

Listing 4. Displaying Ethernet statistics for a network adapter
#entstat ‑d en2
Device Type: 10/100/1000 Base‑TX PCI‑X Adapter (14106902) 
Hardware Address: 00:02:55:d3:37:be 
Elapsed Time: 114 days 22 hours 48 minutes 20 seconds

Transmit Statistics:           Receive Statistics:
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑           ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑
Packets: 490645639             Packets: 3225432063
Bytes: 9251643184881           Bytes: 215598601362
Interrupts: 0                  Interrupts: 3144149248
Transmit Errors: 0             Receive Errors: 0
Packets Dropped: 0             Packets Dropped: 0
                               Bad Packets: 0

Max Packets on S/W Transmit Queue: 109 
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 0

Broadcast Packets: 442         Broadcast Packets: 10394992
Multicast Packets: 0           Multicast Packets: 349
No Carrier Sense: 0            CRC Errors: 0
DMA Underrun: 0                DMA Overrun: 0
Lost CTS Errors: 0             Alignment Errors: 0
Max Collision Errors: 0        No Resource Errors: 0
Late Collision Errors: 0       Receive Collision Errors: 0
Deferred: 0                    Packet Too Short Errors: 0
SQE Test: 0                    Packet Too Long Errors: 0
Timeout Errors: 0              Packets Discarded by Adapter: 0
Single Collision Count: 0      Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 0

General Statistics:
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 200
Driver Flags: Up Broadcast Running 
Simplex 64BitSupport ChecksumOffload 
PrivateSegment DataRateSet

10/100/1000 Base‑TX PCI‑X Adapter (14106902) Specific Statistics:
Link Status: Up
Media Speed Selected: 100 Mbps Full Duplex 
Media Speed Running: 100 Mbps Full Duplex 
PCI Mode: PCI‑X (100‑133) 
PCI Bus Width: 64‑bit Jumbo
Frames: Disabled 
TCP Segmentation Offload: Enabled
TCP Segmentation Offload Packets Transmitted: 260772859
TCP Segmentation Offload Packet Errors: 0 
Transmit and Receive Flow Control Status: Disabled 
Transmit and Receive Flow Control Threshold (High): 32768 
Transmit and Receive Flow Control Threshold (Low): 24576 
Transmit and Receive Storage Allocation (TX/RX): 16/48

If the adapter is up, you can establish whether the route from your host to the target is correct by using route get (see Listing 5). If there’s no route at all, then ping will inform you, but if there is, you will need to establish what it is to verify with the network administrator that it’s correct. Based on the information in the routing table that your host uses, route get will tell you the gateway the packets will be routed to when leaving your host on the way to the target.

Listing 5. Getting routing table information for a host
#route get testhost

     route to: testhost
    interface: en2
  interf addr: myhost
            flags: <UP,GATEWAY,DONE,PRCLONING>
 recvpipe  sendpipe  ssthresh  rtt,msec    rttvar  hopcount      mtu     expire
    0          0         0         0             0       0           0    ‑9751026

If the route is correct, then you can use traceroute to determine the exact route that packets will take across the network to the target host. The output of a successful traceroute (see Listing 6) shows each router the packets travel through to reach the target host along with the minimum, average, and maximum response time taken to get to that router.

Listing 6. Tracing a successful route to a host
#traceroute testhost
trying to get source for testhost
source should be
traceroute to testhost ( from (, 30 hops max
outgoing MTU = 1500
 1 (  1 ms  0 ms  0 ms
 2 (  0 ms  0 ms  0 ms
 3  testhost (  1 ms  1 ms  1 ms

An unsuccessful traceroute (see Listing 7) has asterisks (*) in the time fields, as they cannot be determined because the probe to the next router timed out. The example also shows the use of the -n switch, which prints numeric host addresses, thereby avoiding name lookup and resolution and speeding up the trace.

Listing 7. Tracing an unsuccessful route to a host
#traceroute ‑n testhost
traceroute testhost
trying to get source for testhost
source should be
traceroute to from, 30 hops max
outgoing MTU = 1500
 1  1 ms  0 ms  0 ms
 2  1 ms  1 ms  1 ms
 3  2 ms  2 ms  2 ms
 4  ∗ ∗ ∗
 5  ∗ ∗ ∗
 6  ∗ ∗ ∗

Services running at the application layer of a TCP/IP network listen on one or more ports that are used to exchange data between clients and the host server as managed by the transport layer. If a valid route exists to the host, and it’s responding to pings but the application service is failing to respond, then you can check connectivity to the relevant ports using telnet.

The telnet command, used in its basic form, establishes a terminal connection to a host. However, you can also use it to establish a connection to a specific port on the host (the default being 23, the telnet service). For a list of standard ports, look in /etc/services.

If the connection is successful, a message indicating the telnet escape sequence is shown (see Listing 8). You need to enter this key sequence (typically, Control-]) to escape back to a telnet> prompt and enter quit to return to a shell prompt.

Listing 8. Testing port 80 (HTTP) on a host (successful)
#telnet testhost 80
Connected to testhost.
Escape character is '^]'.
telnet> quit
Connection closed.

Depending on the type of connection you’re making, the remote service you’re connecting to may generate a message similar to Listing 9.

Listing 9. Testing port 25 (SMTP) on a host (successful)
#telnet testhost 25
Connected to testhost.
Escape character is '^]'.
220 ESMTP Sendmail Wed, 10 Feb 2010 15:52:28 GMT
telnet> quit
Connection closed.

If the connection fails, then either a connection timeout or a connection refused message will be displayed (see Listing 10). This message can mean that the service on the target host isn’t running (and therefore nothing is listening on the port), or that a firewall running on the host (or somewhere en route) is blocking connections to the port.

Listing 10. Testing port 515 (remote printing) on a host (unsuccessful)
#telnet testhost 515
telnet: Unable to connect to remote host: Connection timed out

Do I know you?

When using a host name in an application or any of the diagnostic commands covered in this article, it’s imperative that the host name can be resolved to an IP address. An IP address is what the Internet layer of a TCP/IP network uses when handling data packets.

A host name must resolve through one of the name-resolution services specified in /etc/irs.conf and /etc/netsvc.conf. The hosts record determines the order the name resolution is performed. Only local and BIND/DNS resolution is covered here; the remaining options are outside the scope of this article.

When local is specified, the /etc/hosts file is used to resolve host names. So, check to see whether there’s an entry for the target host (see Listing 11).

Listing 11. Looking for a host in /etc/hosts
#grep testhost /etc/hosts    testhost aixserver

If you specify bind or dns, then DNS is used to resolve host names, and you can use nslookup to check whether the host name resolves (see Listing 12).

Listing 12. Resolving a host name via DNS
#nslookup testhost

A more powerful DNS interrogation tool is dig. This command has a much richer set of options and arguments than nslookup. The latter has an interactive mode that provides the additional functionality. So, for more complex queries—particularly where the output will be parsed by a script—dig is preferred (see Listing 13).

Listing 13. Reverse lookup of an IP address in DNS
#dig ‑x
; <<>> DiG 9.2.0 <<>> ‑x

;; global options:  printcmd
;; Got answer:
;; ‑>>HEADER<<‑ opcode: QUERY, status: NOERROR, id: 21351
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;‑    IN      PTR


;; Query time: 11 msec
;; WHEN: Fri Feb 12 13:28:16 2010
;; MSG SIZE  rcvd: 82

A host uses the Address Resolution Protocol (ARP) table or arp cache to keep track of the media access control (MAC) address of other network devices alongside their IP addresses. The link layer of a TCP/IP network uses a device’s MAC address, so the ARP table is used to translate a MAC address to an IP address and back. If your host has communicated successfully with another host, it’s likely that there is an entry in the ARP table. You can use arp to display the entry for a particular host if one exists (see Listing 14).

Listing 14. Displaying a host entry in the ARP table
#arp testhost
testhost ( at 0:c:29:44:90:28 [ethernet] stored in bucket 0

You can also display the entire table using the -a switch (see Listing 15). The -n switch specifies that host name-to-IP address resolution shouldn’t be performed.

Listing 15. Displaying the contents of the ARP table
#arp ‑an
  ? ( at 00:c:29:44:90:28 [ethernet] stored in bucket 0
  ? ( at 0:10:db:27:d9:8 [ethernet] stored in bucket 4
  ? ( at 0:1b:78:59:88:d8 [ethernet] stored in bucket 4
  ? ( at 0:11:25:a6:20:78 [ethernet] stored in bucket 14
  ? ( at 0:1b:78:57:a:d0 [ethernet] stored in bucket 14
  ? ( at 0:0:c:7:ac:0 [ethernet] stored in bucket 15
  ? ( at 0:d:65:e2:4c:c2 [ethernet] stored in bucket 18
  ? ( at 0:11:25:a6:d7:9a [ethernet] stored in bucket 24
bucket:    0     contains:    1 entries
bucket:    1     contains:    0 entries
bucket:    2     contains:    0 entries
bucket:    3     contains:    0 entries
bucket:    4     contains:    2 entries
There are 8 entries in the arp table.

Can you hear me?

To establish a connection, TCP uses a three-way handshake. A client initiates a connection to a host (and a specific port) by sending a SYN synchronize packet. After successfully receiving it, the host responds with a SYN-ACK acknowledgement. If the client successfully receives this acknowledgement, the client completes the handshake with an ACK acknowledgement. All of this assumes that the host server is listening on the specified port, that a route exists from the client to the host and back again, and that no firewalls are blocking this kind of traffic.

You can use netstat to display existing connections from your host to other hosts and the current state of each. Using the command with the -a switch (show the state of all sockets) and the -n switch (show addresses numerically, avoiding lookup), you can pipe the output to a suitable grep to look for connections in a particular state (for example, ESTABLISHED for post-handshake, active connections) or connections to a particular host or port.

Listing 16 shows all connections and their state to a particular IP address (two connections to, both fully established), to a particular port at a particular IP address (to at port 22), and all fully established connections to any host respectively.

Listing 16. Displaying the status of connections to hosts
#netstat ‑an | grep
tcp4       0      0      ESTABLISHED
tcp4       0      0        ESTABLISHED

#netstat ‑an | grep
tcp4       0      0        ESTABLISHED

#netstat ‑an | grep ESTABLISHED
tcp4       0      0      ESTABLISHED
tcp4       0      0        ESTABLISHED
tcp4       0      0    ESTABLISHED
tcp4       0      0     ESTABLISHED
tcp4       0      0        ESTABLISHED
tcp4       0      0          ESTABLISHED
tcp4       0      0       ESTABLISHED
tcp4       0      0    ESTABLISHED

You can monitor outgoing data sent from a particular adapter using tcpdump, which displays the content of each packet as it is sent. The command takes various options to allow you to display more or less of the packet either in descriptive or raw form and allows a number of Boolean expressions to filter the type of data you want to see. For example, monitoring packets on adapter en2, you can show only data being sent to a specific host (see Listing 17).

Listing 17. Display packets destined for a specific host
#tcpdump ‑i en2 dst host testhost
tcpdump: listening on en2
10:08:24.912057892 myhost.46183 > testhost.22: P 1299060979:1299061027(48) 
                           ack 3373421618 win 17520 (DF) tos 0x1010:08:25.009291439 myhost.46183 > testhost.22: P 1:49(48) ack 48 win 17520 (DF) 
                           tos 0x1010:08:25.093832676 myhost.46183 > testhost.22: . ack 96 win 17520 (DF) 
                           tos 0x1010:08:25.249319253 myhost.46183 > testhost.22: P 1299061075:1299061123(48) ack 3373421714 
                           win 17520 (DF) tos 0x10^C
53 packets received by filter
0 packets dropped by kernel

You can show only packets coming from a specific host (see Listing 18).

Listing 18. Display packets sent by a specific host
#tcpdump ‑i en2 src host testhost
tcpdump: listening on en2
10:10:38.505848354 testhost.22 > myhost.46183: . ack 130 win 24820 (DF) tos 0x1010:10:38.505916972 testhost.22 > myhost.46183: F 529:529(0) ack 225 win 24820 (DF) 
                           tos 0x1010:10:43.855153846 testhost > myhost: icmp: echo reply
10:10:44.855224394 testhost > myhost: icmp: echo reply
102 packets received by filter
0 packets dropped by kernel

You can show only packets sent to or coming from a specific port (see Listing 19).

Listing 19. Display packets destined for or sent by a specific host on a specific port
#tcpdump ‑i en2 host testhost port 22
12:15:38.033833162 myhost.47216 > testhost.22: . ack 610148954 win 17520 (DF) tos 0x1012:15:38.113807903 myhost.47216 > testhost.22: P 145:193(48) ack 192 win 17520 (DF) 
                           tos 0x1012:15:38.114291921 testhost.22 > myhost.47216: P 192:240(48) ack 193 win 24820 (DF) 
                           tos 0x1012:15:38.241718122 myhost.47216 > testhost.22: P 193:241(48) ack 240 win 17520 (DF) 
                           tos 0x1012:15:38.242344703 testhost.22 > myhost.47216: P 240:288(48) ack 241 win 24820 (DF) 
                           tos 0x1012:15:38.243844593 myhost.47216 > testhost.22: . ack 288 win 17520 (DF) tos 0x1012:15:38.497817604 myhost.47216 > testhost.22: P 241:289(48) ack 288 win 17520 (DF) 
                           tos 0x1012:15:38.503088328 testhost.22 > myhost.47216: P 288:336(48) ack 289 win 24820 (DF)
                           tos 0x1012:15:38.503154802 testhost.22 > myhost.47216: P 336:432(96) ack 289 win 24820 (DF)
                           tos 0x10^C
145 packets received by filter
0 packets dropped by kernel

You can stop the trace by pressing Control-C. The tcpdump command is much more feature rich than the simple examples shown here, so I recommend that you familiarize yourself with its man pages.

As you can see from the output in these three examples, traffic is shown with:

  • A timestamp
  • Source Host.Source Port
  • Destination Host.Destination Port
  • Packet flags
  • Other packet information

You can use the command to establish whether traffic is leaving your host destined for the target host and whether traffic is making its way back. If no inbound traffic appears, it may be that the host isn’t responding or there’s no valid route from your host to the target or vice versa. If a particular service (TCP port) isn’t responding or a firewall is blocking packets of the type you are sending, you will typically see an R in the packet flags field, indicating that the connection has been reset. For more information on the exact layout and format of a TCP packet, refer to RFC 793: Transmission Control Protocol.

Depending on the nature of the problem, it is sometimes good practice to run a tcpdump for a period of time while capturing packet information to a file using the -w switch. Once you feel you have captured enough data, press Control-C to stop the trace. At this point, you can process the file using the -r option to read the packet data captured. You can then use the vast array of switches, options, and Boolean arguments to analyze the data. Listing 20 shows an example of this process.

Listing 20. Capture packet data to a file and analyze it
#tcpdump ‑w /var/tmp/tcpdump.out ‑i en1
tcpdump: listening on en1
305 packets received by filter
0 packets dropped by kernel

#tcpdump ‑r /var/tmp/tcpdump.out host testhost
13:10:12.017777365 testhost.22 > myhost.47216: P 790304:790352(48) ack 1110769 win 24820 
                           (DF) tos 0x1013:10:12.129146164 myhost.47216 > testhost.22: P 135249:135297(48) ack 126560 win 17520 
                           (DF) tos 0x1013:10:12.129992465 testhost.22 > myhost.47216: P 790352:790416(64) ack 1110817 win 24820 
                           (DF) tos 0x1013:10:12.203827965 myhost.47216 > testhost.22: . ack 790416 win 17520 (DF) tos 0x1013:11:35.707809458 myhost > testhost: icmp: echo request (DF)
13:11:35.709883978 testhost > myhost: icmp: echo reply (DF)

#tcpdump ‑r /var/tmp/tcpdump.out not port 22
13:11:35.707809458 myhost > testhost: icmp: echo request (DF)
13:11:35.709883978 testhost > myhost: icmp: echo reply (DF)
13:11:36.579874114 arp who‑has tell
13:11:37.077504208 0:2:16:9e:20:a 1:80:c2:0:0:0 0026 38:
                         4242 0300 0000 0000 8000 0002 1695 aecb
                         0000 0026 8000 0002 169e 2008 8017 0200
                         1400 0200 0f00
13:11:38.065119802 > myhost.tnslsnr: P 502:591(89) ack 421 
                                    win 64056
13:11:38.071526597 > myhost.tnslsnr: P 591:606(15) ack 548 
                                    win 63929
13:11:38.896664820‑ns >‑ns: udp 50
13:11:39.071526597‑ns >‑ns: udp 50


This article covered some of the AIX tools you can use to test connectivity to a host, extract useful network-related information about a host, and analyze data sent to and from a host. In the next article, you’ll get under the covers to see what is really going on when your host has problems communicating with another. The article will conclude with a step-by-step guide to logical problem diagnosis when encountering network-related issues.

Martin Wicks