Troubleshooting hardware issues in Linux

Daniel Oh

7 years ago

Linux servers run mission-critical enterprise functions in lots of several types of infrastructures together with bodily machines, virtualization, personal cloud, public cloud, and hybrid cloud. It’s essential for Linux sysadmins to know methods to handle Linux hardware infrastructure—together with software-defined functionalities associated to networking, storage, Linux containers, and a number of instruments on Linux servers.

It can take a while to troubleshoot and resolve hardware-related points on Linux. Even extremely skilled sysadmins generally spend hours working to unravel mysterious hardware and software program discrepancies.

The following suggestions ought to make it faster and simpler to troubleshoot hardware in Linux. Many various things could cause issues with Linux hardware; earlier than you begin attempting to diagnose them, it is good to find out about the most typical points and the place you are more than likely to search out them.

Quick-diagnosing units, modules, and drivers

The first step in troubleshooting normally is to show a listing of the hardware put in in your Linux server. You can get hold of detailed info on the hardware utilizing ls instructions similar to lspci, lsblk, lscpu, and lsscsi. For instance, right here is output of the lsblk command:

# lsblk 
NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda    202:zero    zero  50G  zero disk 
├─xvda1 202:1    zero   1M  zero half 
└─xvda2 202:2    zero  50G  zero half /
xvdb    202:16   zero  20G  zero disk 
└─xvdb1 202:17   zero  20G  zero half

If the ls instructions do not reveal any errors, use init processes (e.g., systemd) to see how the Linux server is working. systemd is the most well-liked init course of for bootstrapping person areas and controlling a number of system processes. For instance, right here is output of the systemctl standing command:

# systemctl standing 
● bastion.f347.inner
    State: working
     Jobs: zero queued
   Failed: zero models
    Since: Wed 2018-11-28 01:29:05 UTC; 2 days in the past
   CGroup: /
           ├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
           ├─kubepods.slice
           │ ├─kubepods-pod3881728a_f2af_11e8_af77_06af52f87498.slice
           │ │ ├─docker-88b27385f4bae77bba834fbd60a61d19026bae13d18eb147783ae27819c34967.scope
           │ │ │ └─23860 /decide/bridge/bin/bridge --public-dir=/decide/bridge/static --config=/var/console-config/console-c
           │ │ └─docker-a4433f0d523c7e5bc772ee4db1861e4fa56c4e63a2d48f6bc831458c2ce9fd2d.scope
           │ │   └─23639 /usr/bin/pod
....

Digging into a number of loggings

Dmesg permits you to determine errors and warnings within the kernel’s newest messages. For instance, right here is output of the dmesg | extra command:

# dmesg | extra 
....
[ 1539.027419] IPv6: ADDRCONF(NETDEV_UP): eth0: hyperlink will not be prepared
[ 1539.042726] IPv6: ADDRCONF(NETDEV_UP): veth61f37018: hyperlink will not be prepared
[ 1539.048706] IPv6: ADDRCONF(NETDEV_CHANGE): veth61f37018: hyperlink turns into prepared
[ 1539.055034] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: hyperlink turns into prepared
[ 1539.098550] gadget veth61f37018 entered promiscuous mode
[ 1541.450207] gadget veth61f37018 left promiscuous mode
[ 1542.493266] SELinux: mount invalid.  Same superblock, totally different safety settings for (dev mqueue, kind mqueue)
[ 9965.292788] SELinux: mount invalid.  Same superblock, totally different safety settings for (dev mqueue, kind mqueue)
[ 9965.449401] IPv6: ADDRCONF(NETDEV_UP): eth0: hyperlink will not be prepared
[ 9965.462738] IPv6: ADDRCONF(NETDEV_UP): vetheacc333c: hyperlink will not be prepared
[ 9965.468942] IPv6: ADDRCONF(NETDEV_CHANGE): vetheacc333c: hyperlink turns into prepared
....

You may also take a look at all Linux system logs within the /var/log/messages file, which is the place you may discover errors associated to particular points. It’s worthwhile to observe the messages through the tail command in actual time if you make modifications to your hardware, similar to mounting an additional disk or including an Ethernet community interface. For instance, right here is output of the tail -f /var/log/messages command:

# tail -f /var/log/messages
Dec  1 13:20:33 bastion dnsmasq[30201]: utilizing nameserver 127.zero.zero.1#53 for area in-addr.arpa
Dec  1 13:20:33 bastion dnsmasq[30201]: utilizing nameserver 127.zero.zero.1#53 for area cluster.native
Dec  1 13:21:03 bastion dnsmasq[30201]: setting upstream servers from DBus
Dec  1 13:21:03 bastion dnsmasq[30201]: utilizing nameserver 192.199.zero.2#53
Dec  1 13:21:03 bastion dnsmasq[30201]: utilizing nameserver 127.zero.zero.1#53 for area in-addr.arpa
Dec  1 13:21:03 bastion dnsmasq[30201]: utilizing nameserver 127.zero.zero.1#53 for area cluster.native
Dec  1 13:21:33 bastion dnsmasq[30201]: setting upstream servers from DBus
Dec  1 13:21:33 bastion dnsmasq[30201]: utilizing nameserver 192.199.zero.2#53
Dec  1 13:21:33 bastion dnsmasq[30201]: utilizing nameserver 127.zero.zero.1#53 for area in-addr.arpa
Dec  1 13:21:33 bastion dnsmasq[30201]: utilizing nameserver 127.zero.zero.1#53 for area cluster.native

Analyzing networking capabilities

You could have tons of of hundreds of cloud-native functions to serve enterprise providers in a posh networking setting; these could embrace virtualization, a number of cloud, and hybrid cloud. This means you need to analyze whether or not networking connectivity is working appropriately as a part of your troubleshooting. Useful instructions to determine networking capabilities within the Linux server embrace ip addr, traceroute, nslookup, dig, and ping, amongst others. For instance, right here is output of the ip addr present command:

# ip addr present
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    hyperlink/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.zero.zero.1/eight scope host lo
       valid_lft eternally preferred_lft eternally
    inet6 ::1/128 scope host 
       valid_lft eternally preferred_lft eternally
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    hyperlink/ether 06:af:52:f8:74:98 brd ff:ff:ff:ff:ff:ff
    inet 192.199.zero.169/24 brd 192.199.zero.255 scope world noprefixroute dynamic eth0
       valid_lft 3096sec preferred_lft 3096sec
    inet6 fe80::4af:52ff:fef8:7498/64 scope hyperlink 
       valid_lft eternally preferred_lft eternally
three: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    hyperlink/ether 02:42:67:fb:1a:a2 brd ff:ff:ff:ff:ff:ff
    inet 172.17.zero.1/16 scope world docker0
       valid_lft eternally preferred_lft eternally
    inet6 fe80::42:67ff:fefb:1aa2/64 scope hyperlink 
       valid_lft eternally preferred_lft eternally
....

In conclusion

Troubleshooting Linux hardware requires appreciable information, together with methods to use highly effective command-line instruments and determine system loggings. You must also know methods to diagnose the kernel area, which is the place you could find the basis explanation for many hardware issues. Keep in thoughts that hardware points in Linux could come from many alternative sources, together with units, modules, drivers, BIOS, networking, and even plain outdated hardware malfunctions.